One, Two, U: Examples of common one- and two-sample U-statistics

My previous two blog posts revolved around derivation of the limiting distribution of U-statistics for one sample and multiple independent samples.

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

The notation within these derivations can get quite complicated and it may be a bit unclear as to how to actually derive components of the limiting distribution.

In this blog post, I provide two examples of both common one-sample U-statistics (Variance, Kendall’s Tau) and two-sample U-statistics (Difference of two means, Wilcoxon Mann-Whitney rank-sum statistic) and derive their limiting distribution using our previously developed theory.

Asymptotic distribution of U-statistics

One sample

For a single sample, $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ , the U-statistic is given by

$U_n = \frac{1}{{n \choose a}} \sum_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})$

where $\phi$ is a symmetric kernel of degree $a$ .

For a review of what it means for $\phi$ to be symmetric, check out U-, V-, and Dupree Statistics.

In the examples covered by this blog post, $a = 2$ , so we can re-write $U_n$ as,

$U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \phi(X_{i}, X_{j}).$

Alternatively, this is equivalent to,

$U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \phi(X_{i}, X_{j}).$

The limiting variance of $U_n$ is given by,

$Var ~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma_{c}^2$

where

$\sigma_{c}^2 = \text{Var}_F \Big[ \mathbb{E}_{F}~\Big( \phi(X_1, …, X_a) | X_1, …, X_c \Big)\Big] = \text{Var}_{F}~ \phi_c(X_1, …, X_c)$

or equivalently,

$\sigma_{c}^2 = \text{Cov} \Big[ \phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)\Big] .$

Note that when $a=c$ , $\phi_c(X_1, …, X_c) = \phi(x_1, …, x_a)$ .

For $a=2$ , these expressions reduce to

$\text{Var}_F~U_n = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]$

where

$\sigma_{1}^{2} = \text{Var}_{F}~\phi_1(X_1) = \text{Var}_{F} \Big[ \mathbb{E}_{F} \Big( \phi(X_1, X_2) | X_1 \Big)\Big]$

and

$\sigma_{2}^{2} = \text{Var}_{F}~\phi(X_1, X_2).$

The limiting distribution of $U_n$ for $a=2$ is then,

$\sqrt{n}(U_n - \theta) \rightarrow N\left(0, \frac{2}{n-1} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]\right).$

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

Two independent samples

For two independent samples denoted $X_1, …, X_m \stackrel{i.i.d}{\sim} F$ and $Y_1, …, Y_n \stackrel{i.i.d}{\sim} G$ , the two-sample U-statistic is given by

$U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}).$

where $\phi$ is a kernel that is independently symmetric within the two blocks $(X_{i_1}, ..., X_{i_a})$ and $(Y_{j_1}, ..., Y_{j_b})$ .

In the examples covered by this blog post, $a = b = 1$ , reducing the U-statistic to,

$U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) .$

The limiting variance of $U$ is given by,

$\text{Var}~U = \frac{a^2}{m} \sigma_{10}^2 + \frac{b^2}{n} \sigma_{01}^{2}$

where

$\sigma^{2}_{10} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ]$

and

$\sigma^{2}_{01} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b),\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ].$

Equivalently,

$\sigma^{2}_{10} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …., Y_b) | X_1 \Big) \Big] = \text{Var}~\phi_{10}(X_1)$

and

$\sigma^{2}_{01} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …, Y_b) | Y_1 \Big) \Big] = \text{Var}~\phi_{01}(Y_1).$

For $a=b=1$ , these expressions reduce to

$\text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{ \sigma_{01}^{2}}{n}$

where

$\sigma^{2}_{10} = \text{Cov} [\phi(X_1; Y_1), \phi(X_1; Y'_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | X_1 \Big) \Big]$

and

$\sigma^{2}_{01} = \text{Cov} [\phi(X_1; Y_1), \phi(X'_1; Y_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | Y_1 \Big) \Big] .$

The limiting distribution of $U_n$ for $a=b=1$ and $N=n+m$ is then,

$\sqrt{N}(U_n - \theta) \rightarrow N\left(0, \frac{N}{m} \sigma_{10}^2 + \frac{N}{n} \sigma_{01}^{2} \right).$

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

Examples of one-sample U-statistics

Variance

Suppose we have an independent and identically distributed random sample of size $n$ , $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ .
We wish to estimate the variance, which can be expressed as an expectation functional,

$\sigma^{2} = \mathbb{E}_{F} \Big[( X -\mathbb{E}_F~X)^2\Big].$

In order to estimate $\sigma^2$ using a U-statistic, we need to identify a kernel function that is unbiased for $\sigma^2$ and symmetric in its argument. We start by considering,

$\psi(x_1, x_2) = x_1^2 - x_1 x_2.$

$\psi$ is unbiased for $\sigma^2$ since

$\mathbb{E}_F \Big[X_1^2 - X_1X_2\Big] = \mathbb{E}_F~ X_1^2 - \mu^2 = \sigma^2$

but is not symmetric since

$\psi(x_1, x_2) = x_1^2 - x_1 x_2 \neq x_2^2 - x_1 x_2 .$

Thus, the corresponding symmetric kernel can be constructed as

$\phi(x_1, x_2) = \frac{1}{a!} \sum_{\pi \in \Pi} \psi(x_{\pi(1)}, x_{\pi(2)}).$

Here, the number of arguments $a = 2$ and $\Pi$ is the set of all permutations of the $a=2$ arguments,

$\Pi = \lbrace (x_1, x_2), (x_2, x_1) \rbrace.$

Then, the symmetric kernel which is unbiased for the variance is,

$\phi(x_1, x_2) = \frac{1}{2} \Big[\phi(x_1, x_2) + \phi(x_2, x_1) \Big] = \frac{x_1^2 - 2 x_1 x_2 + x_2^2}{2} = \frac{(x_1 - x_2)^2}{2}.$

An unbiased estimator of $\sigma^2$ is then the U-statistic,

$U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \frac{(X_{i} - X_{j})^2}{2}$

or equivalently,

$U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \frac{(X_{i} - X_{j})^2}{2}.$

Focusing on the second form of the sum and recognizing that

$\sum_{i \neq j} X_{i}^2 = \sum_{i \neq j} X_{j}^2$

and,

$\sum_{i \neq j}^{n} X_i = \sum_{i=1}^{n} \left( \sum_{j=1}^{n} X_j - X_i \right)$

we have,

$\begin{align*} \sum_{i \neq j} (X_{i} - X_{j})^2 &= \sum_{i \neq j} X_{i}^2 - 2 X_{i} X_{j} + X_{j}^2 \\ &= 2 \Big( \sum_{i \neq j} X_{i}^2 - X_{i} X_{j} \Big) \\ &= 2 \left(\sum_{i=1}^{n} \left[ \sum_{j=1}^{n} X_j^{2} - X_i^2 \right] - \sum_{i=1}^{n} X_i \left[\sum_{j=1}^{n} X_j - X_i \right] \right) \\ &= 2 \left( n \sum_{j=1}^{n} X_j^{2} - \sum_{i=1}^{n} X_{i}^2 - n \bar{X} \sum_{i=1}^{n} X_i + \sum_{i=1}^{n} X_{i}^2 \right) \\ &= 2 n \sum_{i=1}^{n} X_{i}^2 - 2 n \bar{X} \sum_{i=1}^{n} X_i \\ &= 2 n \sum_{i=1}^{n} X_{i}^2 - 2 n^2 \bar{X}^2. \end{align*}$

Plugging this simplified expression back into our formula for $U_n$ , we obtain

$\begin{align*} U_n &= \frac{1}{n(n-1)} \left[n \sum_{i=1}^{n} X_{i}^2 - n^2 \bar{X}^2 \right] \\ &= \frac{1}{n-1} \left[ \sum_{i=1}^{n} X_{i}^2 - \frac{1}{n} \left( \sum_{i=1}^{n} X_i \right)^2 \right] \\ &= s_n^{2} \end{align*}$

as desired.

It is well-known that $s_{n}^2$ is the unbiased estimator of the sample variance such that,

$\mathbb{E}_{F} ~U_n = \mathbb{E}_{F} ~s_n^{2} = \sigma^{2}$

but what about the variance of $U_n$ ? For a sample size of $n$ and $a = 2$ ,

$\text{Var}_{F}~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c} {n-a \choose a-c} \sigma^{2}_{c} = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right].$

To derive the first variance component $\sigma^{2}_1$ , we start by taking the expectation of our kernel conditional on $X_1$ ,

$\begin{align*} \phi_1(X_1) &= \mathbb{E}_F \left[ \frac{(X_1 - X_2)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - x_1)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - \mu + \mu - x_1)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - \mu)^2 + 2 (X_2 - \mu)(x_1-\mu) + (\mu - x_1)^2}{2}\right] \\ &= \frac{\sigma^2}{2} + \frac{(x_1 - \mu)^2}{2}. \end{align*}$

Now, our first variance component $\sigma^{2}_1$ is just equal to the variance of $\phi_1(X_1)$ and since $\frac{\sigma^2}{2}$ is just a constant, we have

$\begin{align*} \sigma_{1}^{2} &= \text{Var}_F~\phi_1(X_1) \\ &= \frac{1}{4}\text{Var}_F \left[ (X_1 - \mu)^2\right] \\ &= \frac{1}{4} \left( \mathbb{E}_F \left[ (X_1 - \mu)^4 \right] - \mathbb{E}_{F} \left[ (X_1 - \mu)^2 \right] \right) \\ &= \frac{\mu_4 - \sigma^4}{4} \end{align*}$

where $\mu_4$ is the fourth central moment.

Next, recognizing that $\phi_{a}(X_1, …, X_a) = \phi(x_1, …, x_a)$ and recycling our “add zero” trick yields an expression for our second variance component $\sigma_{2}^2$ ,

$\begin{align*} \sigma_{2}^{2} &= \text{Var}_{F}~ \phi(X_1, X_2) \\ &= \text{Var}_{F}~ \left[ \frac{(X_1 - X_2)^2}{2} \right] \\ &= \mathbb{E}_F \left[ \frac{(X_1 - \mu + \mu - X_2)^4}{4} \right] - \mathbb{E}_F \left[ \frac{(X_1 - X_2)^2}{2} \right]^2. \end{align*}$

We know by definition that the kernel is an unbiased estimator of $\sigma^{2}$ by definition so that,

$\sigma_{2}^{2} = \mathbb{E}_F \left[ \frac{(X_1 - \mu + \mu - X_2)^4}{4} \right] - \sigma^{4}$

To simplify the remaining expectation, recall that,

$(a+b)^4 = a^2 + 4 a^3 b + 6 a^2 b^2 + 4 a b^3 + b^4$

and let $a = (X_1 - \mu)$ and $b = (\mu - X_2)$ . Then,

$\begin{align*} \mathbb{E}_{F} \left[ (a+b)^4 \right] &= \mathbb{E}_{F} \left[ (X_1 - \mu)^4 \right] + 6 \mathbb{E}_{F} \left[ (X_1 - \mu)^2(X_2 - \mu)^2 \right] + \mathbb{E}_{F} \left[ (X_2 - \mu)^4 \right] \\ &= 2 \mu_4 + 6 \sigma^{4}. \end{align*}$

Substituting this back into our expression for $\sigma_2^{2}$ , we have

$\sigma_{2}^{2} = \frac{2 \mu_4 + 6 \sigma^{4} - 4 \sigma^{4}}{4} = \frac{\mu_4 + \sigma^{4}}{2}.$

Finally, plugging our two variance components into our expression for $\text{Var}_{F}~U_n$ ,

$\begin{align*} \text{Var}_{F}~U_n = \frac{2}{n(n-1)} \left[2(n-2) \left(\frac{\mu_4 - \sigma^{4}}{4}\right) + \frac{\mu_4 + \sigma^{4}}{2} \right] = \frac{\mu_4}{n} - \frac{\sigma^4(n-3)}{n(n-1)}. \end{align*}$

Then, our asymptotic result for $U_n$ tells us,

$\sqrt{n}(U_n - \sigma^{2}) \rightarrow N\left(0, \mu_4 - \frac{\sigma^4(n-3)}{(n-1)}\right).$

Kendall’s Tau

Consider $n$ bivariate, continuous observations of the form

$(X_i, Y_i) \>\>,\>\>i = 1, …, n.$

A pair of observations, $\{(X_i, Y_i), (X_j, Y_j)\}$ is considered “concordant” if

$\begin{align*} (X_i > X_j \> &\cap \> Y_i > Y_j) \> \cup \> (X_i < X_j \> \cap \> Y_i < Y_j) \end{align*}$

and “discordant” otherwise.

The probability that two observations are concordant is then,

$c = P(X_i < X_j, Y_i < Y_j) + P(X_i > X_j, Y_i > Y_j)$

and the probability that two observations are discordant is then,

$d = 1 - c.$

Kendall’s Tau, denoted $\tau$ , is the proportion of concordant pairs minus the proportion of discordant pairs, or the difference between $c$ and $d$ such that,

$\begin{align*} \tau &= c - (1-c) \\ &= 2c - 1 \\ &= 2 \left[ P(X_i < X_j, Y_i < Y_j) + P(X_i > X_j, Y_i > Y_j) \right] - 1. \end{align*}$

$\tau$ ranges between $-1$ and $1$ and is used as a measure of the strength of monotone increasing/decreasing relationships, with $\tau = 0$ suggesting that $X$ and $Y$ are independent and $\tau = 1$ suggesting a perfect monotonic increasing relationship between $X$ and $Y$ .

Based on our definition of $\tau$ , the form of the symmetric kernel is immediately obvious,

$\phi((x_i, y_i), (x_j, y_j)) = 2 \left[ \mathbb{I}(x_i < x_j, y_i < y_j) + \mathbb{I}(x_i > x_j, y_i > y_j) \right] - 1.$

where $\mathbb{I}(\cdot)$ is an indicator function taking the value $1$ when its argument is true and $0$ otherwise.

Note that

$\mathbb{I}(x_i < x_j, y_i < y_j) \equiv \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j)$

and

$\mathbb{I}(x_i > x_j, y_i > y_j) \equiv [1-\mathbb{I}(x_i < x_j)][1- \mathbb{I}(y_i < y_j)]$

so that our kernel may be re-expressed as,

$\begin{align*} \phi((x_i, y_i), (x_j, y_j)) &= 2 \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j) + 2[1-\mathbb{I}(x_i < x_j)][1- \mathbb{I}(y_i < y_j)] - 1 \\ &= 4 \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j) - 2 \mathbb{I}(x_i < x_j) - 2\mathbb{I}(y_i < y_j) + 1 \\ &= [2\mathbb{I}(x_i < x_j) - 1][2\mathbb{I}(y_i < y_j)-1] \\ &= [1-2\mathbb{I}(x_j < x_i)][1-2\mathbb{I}(y_j < y_i)]. \end{align*}$

This will come in handy later.

Now that we have identified our kernel function, we can construct our U-statistic,

$U_n = \frac{1}{{n \choose 2}} \sum_{i < j} 2 \left[ \mathbb{I}(x_i < x_j, y_i < y_j) + \mathbb{I}(x_i > x_j, y_i > y_j) \right] - 1.$

It is obvious that $\mathbb{E}~U_n = \tau$ . Once again, $a=2$ and the variance of $U_n$ is given by,

$\text{Var}_{F}~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c} {n-a \choose a-c} \sigma^{2}_{c} = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right].$

For the purposes of demonstration and to simplify derivation of the variance components, suppose we are operating under the null hypothesis that $X$ and $Y$ are independent, or equivalently

$H_0:~ \tau = 0 .$

To find our first variance component $\sigma^{2}_1$ , we must find the expectation of our kernel conditional on $(X_1, Y_1)$ ,

$\begin{align*} \phi_1((X_1, Y_1)) &= \mathbb{E} \left[ \phi((X_1, Y_1), (X_2, Y_2)) \middle| (X_1, Y_1) \right] \\ &= \mathbb{E} \Big[ [1-2\mathbb{I}(X_2 < x_1)][1-2\mathbb{I}(Y_2 < y_1)] \Big]. \end{align*}$

If $X \sim F$ and $Y \sim G$ , then $\mathbb{E}[\mathbb{I}(X < x)] = F(x)$ and,

$\phi_1((X_1, Y_1)) = (1 - 2 F(x_1))(1 - 2 G(y_1))$

Then, the first variance component is given by,

$\sigma_1^{2} = \text{Var}[\phi_1((X_1, Y_1))] = \text{Var} \Big[(1 - 2 F(X_1))(1 - 2 G(X_2)). \Big]$

$F(X_1)$ and $G(Y_1)$ are independent random variables distributed according to $\text{Unif}(0,1)$ .

If $U \sim \text{Unif}(0,1)$ then $1-U \sim \text{Unif}(0,1)$ . Thus, if we let $U = 1 - 2 F(X_1)$ and $V = 1 - 2 G(Y_1)$ , $U$ and $V$ are both distributed according to $\text{Unif}(-1, 1)$ .

Since $U$ and $V$ are independent, applying the identity $\text{Var}~Z = E[Z^2] - E[Z]^2$ yields,

$\sigma_1^{2} = \mathbb{E}[U^2]~ \mathbb{E}[V^2] - \mathbb{E}[U]^2~ \mathbb{E}[V]^2$

Recall that if $Z \sim \text{Unif}(a, b)$ ,

$\mathbb{E}~Z = \frac{a+b}{2} \> \>,\>\> \text{Var}~Z = \frac{(b-a)^2}{12}.$

For $a = -1$ and $b=1$ , we have

$\mathbb{E}[U] = 0$

and

$\mathbb{E}[U^2] = \text{Var}[U] = \frac{2^2}{12} = \frac{1}{3}.$

The same is true for $V$ .

Plugging our results back into our equation for $\sigma_1^{2}$ yields,

$\sigma_1^{2} = \left(\frac{1}{3}\right)^2 = \frac{1}{9} .$

Next, $\phi_2((X_1, Y_1), (X_2, Y_2)) = \phi((x_1, y_1), (x_2, y_2))$ and,

$\sigma_2^{2} = \text{Var} [\phi_2] = \mathbb{E}[\phi_2^{2}] - \mathbb{E}[\phi_2]^2 .$

By definition, $\mathbb{E}[\phi_2] = \mathbb{E}[\phi((X_1, Y_1), (X_2, Y_2))] = \tau$ so that,

$\sigma_2^{2} = \text{Var} [\phi_2] = \mathbb{E}\Big[\Big(1-2\mathbb{I}(X_2 < X_1)\Big)^2\Big] \mathbb{E}\Big[\Big(1-2\mathbb{I}(Y_2 < Y_1)\Big)^2\Big] - \tau^2 .$

Note that since $X_1$ and $X_2$ are identically distributed and continuous, either $X_1 < X_2$ or $X_1 > X_2$ , so that

$\mathbb{I}(X_i < X_j) \sim \text{Bernoulli}\left(p = \frac{1}{2}\right)$

Then we can use the properties of the Bernoulli distribution to derive the properties of $\mathbb{I}(X_2 < X_1)$ we need. That is,

$\mathbb{E}[\mathbb{I}(X_2 < X_1)] = \frac{1}{2},$

$\text{Var}[\mathbb{I}(X_2 < X_1)] = \left(\frac{1}{2}\right)^2 = \frac{1}{4},$

and

$\mathbb{E}[\mathbb{I}(X_2 < X_1)^2] = \frac{1}{4} + \left(\frac{1}{2}\right)^2 = \frac{1}{2}.$

Finally, we have

$\mathbb{E}\left[\Big(1-2\mathbb{I}(X_2 < X_1)\Big)^2\right] = 1 - 4 \left(\frac{1}{2}\right) + 4 \left(\frac{1}{2}\right) = 1.$

The same arguments hold for $\mathbb{I(Y_2 < Y_1)}$ and we obtain,

$\sigma_2^{2} = 1 - \tau^2 .$

However, since $\tau = 0$ under the null hypothesis, $\sigma^{2}_2 = 1$ .

Now that we have determined the value of $\sigma_1^{2}$ and $\sigma_2^{2}$ under the null hypothesis that $X$ and $Y$ are independent, we can plug these components into our formula for $\text{Var}~U_n$ , giving us

$\text{Var}~U_n = \frac{2}{n(n-1)} \left[\frac{2(n-2)}{9} + 1 \right] = \frac{2}{n(n-1)}\left[\frac{2n + 5}{9}\right].$

Our asymptotic result for $U_n$ tells us,

$\sqrt{n}(U_n - \tau) \rightarrow N\left(0, \frac{2}{(n-1)}\left[\frac{2n + 5}{9}\right]\right) .$

Examples of two-sample U-statistics

Mean comparison

Suppose we have two independent random samples of size $m$ and size $n$ ,

$X_1, …, X_m \stackrel{i.i.d}{\sim} F$

and

$Y_1, …, Y_n \stackrel{i.i.d}{\sim} G.$

We wish to compare the means of the two groups. The obvious choice for our kernel is,

$\phi(x_i; y_j) = x_i - y_j$

so that $a = b = 1$ and our corresponding U-statistic is,

$U= \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} (X_i - Y_j) = \bar{X}_m - \bar{Y}_n.$

Based on our previous derivation of the distribution of two-sample U-statistics, we have

$\text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{\sigma_{01}^{2}}{n} .$

For the first variance component, we need to take the expectation of $\phi$ conditional on a single $X_i$ such that,

$\phi_{10}(X_1) = \mathbb{E} \left[X_1 - Y_1 | X_1 \right] = x_1 - \mathbb{E}[Y_1] = x_1 - \mu_{Y}.$

Similarly, for the second variance component, we need to condition on a single $Y_i$ such that,

$\phi_{01}(Y_1) = \mathbb{E} \left[X_1 - Y_1 | Y_1 \right] = \mathbb{E}[X_1] - y_1 = \mu_{X} - y_1.$

Since $\mu_X$ and $\mu_Y$ are just constants, it is easy to see that,

$\sigma_{10}^2 = \text{Var}~X_1 = \sigma^{2}_X$

and,

$\sigma_{01}^2 = \text{Var}~Y_1 = \sigma^{2}_Y.$

Finally, plugging these variance components into our formula for $\text{Var}~U$ , we obtain the variance we would expect for a comparison of two means,

$\text{Var}~U = \frac{\sigma^{2}_X}{m} + \frac{\sigma^{2}_Y}{n} .$

Wilcoxon Mann-Whitney rank-sum test

Suppose we have two independent random samples of size $m$ and size $n$ ,

$X_1, …, X_m \stackrel{i.i.d}{\sim} F$

and

$Y_1, …, Y_n \stackrel{i.i.d}{\sim} G.$

We assume that $X$ and $Y$ are continuous so that no tied values are possible. Let $R_1, …, R_m$ rpresent the full-sample ranks of the $X_i$ and $S_1, …, S_n$ represent the ranks of the $Y_j$ .

Then, the Wilcoxon Mann-Whitney (WMW) rank-sum statistic is,

$W_{XY} = \sum_{j=1}^{n} S_j - \frac{1}{2} n(n+1)$

which can be shown to be equivalent to the number of pairs $(X_i, Y_j)$ for which $X_i < Y_j$ . That is, we can re-express the WMW statistic as,

$W_{XY} = \sum_{i=1}^{m} \sum_{j=1}^{n} \mathbb{I}(X_i < Y_j) .$

If we divide $W_{XY}$ by the total number of $(X_i, Y_j)$ pairs, we obtain

$\frac{1}{mn} W_{XY} = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \mathbb{I}(X_i < Y_j)$

which is exactly the form of a two-sample U-statistic with $a = b = 1$ and $\phi(X_i; Y_j) = \mathbb{I}(X_i < Y_j)$ ,

$U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_{i}; Y_{j})$

so that $\mathbb{E}~U = P(X < Y) = \theta$ . $\theta$ is commonly referred to as the probabilistic index.

For more information on the probabilistic index for two continuous outcomes, check out The probabilistic index for two normally distributed outcomes.

Our previous work tells us that

$\text{Var}~U = \frac{\sigma^{2}_{01}}{m} + \frac{\sigma^{2}_{10}}{n}.$

The first variance component $\sigma_{01}^2$ can be expressed as,

$\sigma_{10}^2 = \text{Cov}[\phi(X; Y), \phi(X; Y')] = \text{Cov}[\mathbb{I}(X < Y), \mathbb{I}(X < Y')].$

Recall that covariance can be expressed in terms of expectation as,

$\text{Cov}[X, Y] = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y]$

so that,

$\sigma_{10}^2 = \mathbb{E} \left[ \mathbb{I}(X < Y) \mathbb{I}(X < Y') \right] - \mathbb{E}[\mathbb{I}(X < Y)] \mathbb{E}[ \mathbb{I}(X < Y')].$

By definition,

$\mathbb{E}[\mathbb{I}(X < Y)] = \mathbb{E}[ \mathbb{I}(X < Y')] = \theta .$

Now, notice that

$\mathbb{I}(X < Y) \mathbb{I}(X < Y') = \begin{cases} 1 & (X < Y) \cap (X < Y') \\ 0 & \text{o.w.} \end{cases}$

so that,

$\mathbb{E} \left[ \mathbb{I}(X < Y) \mathbb{I}(X < Y') \right] = P(X < Y \cap X < Y').$

Following similar logic for $\sigma_{01}^2$ , it should be clear that we have

$\sigma_{10}^2 = P(X < Y \cap X < Y') - P(X < Y)^2$

and

$\sigma_{01}^2 = P(X < Y \cap X' < Y) - P(X < Y)^2.$

Under the null hypothesis $H_0: F = G$ , $X$ and $Y$ have the same (continuous) distribution so that either $X > Y$ or $Y < X$ , implying $P(X < Y) = \frac{1}{2}$ under $H_0$ .

Similarly, there are 6 equally likely orderings of $X, X'$ , and $Y$ under $H_0$ : (1) $X < X' < Y$ , (2) $X' < X < Y$ , (3) $Y < X < X'$ , (4) $X < Y < X'$ , (5) $X' < Y < X$ , and (6) $Y < X' < X$ . Then,

$P(X < Y \cap X' < Y) = P(X < X' < Y) + P(X' < X < Y) = \frac{2}{6} = \frac{1}{3} .$

Noting that $P(X < Y \cap X' < Y) = P(X < Y \cap X < Y')$ , plugging these values into our expressions for $\sigma_{10}^2$ and $\sigma_{01}^2$ gives us,