One, Two, U: Examples of common one- and two-sample U-statistics

My previous two blog posts revolved around derivation of the limiting distribution of U-statistics for one sample and multiple independent samples.

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

The notation within these derivations can get quite complicated and it may be a bit unclear as to how to actually derive components of the limiting distribution.

In this blog post, I provide two examples of both common one-sample U-statistics (Variance, Kendall’s Tau) and two-sample U-statistics (Difference of two means, Wilcoxon Mann-Whitney rank-sum statistic) and derive their limiting distribution using our previously developed theory.

Asymptotic distribution of U-statistics

One sample

For a single sample, X_1, …, X_n \stackrel{i.i.d}{\sim} F, the U-statistic is given by

    \[U_n = \frac{1}{{n \choose a}} \sum_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

where \phi is a symmetric kernel of degree a.

For a review of what it means for \phi to be symmetric, check out U-, V-, and Dupree Statistics.

In the examples covered by this blog post, a = 2, so we can re-write U_n as,

    \[U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \phi(X_{i}, X_{j}).\]

Alternatively, this is equivalent to,

    \[U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \phi(X_{i}, X_{j}).\]

The limiting variance of U_n is given by,

    \[ Var ~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma_{c}^2 \]

where

    \[ \sigma_{c}^2 = \text{Var}_F \Big[ \mathbb{E}_{F}~\Big( \phi(X_1, …, X_a) | X_1, …, X_c \Big)\Big] = \text{Var}_{F}~ \phi_c(X_1, …, X_c) \]

or equivalently,

    \[ \sigma_{c}^2 = \text{Cov} \Big[ \phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)\Big] .\]

Note that when a=c, \phi_c(X_1, …, X_c) = \phi(x_1, …, x_a).

For a=2, these expressions reduce to

    \[\text{Var}_F~U_n = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right] \]

where

    \[ \sigma_{1}^{2} = \text{Var}_{F}~\phi_1(X_1) = \text{Var}_{F} \Big[ \mathbb{E}_{F} \Big( \phi(X_1, X_2) | X_1 \Big)\Big] \]

and

    \[ \sigma_{2}^{2} = \text{Var}_{F}~\phi(X_1, X_2). \]

The limiting distribution of U_n for a=2 is then,

    \[\sqrt{n}(U_n - \theta) \rightarrow N\left(0, \frac{2}{n-1} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]\right).\]

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

Two independent samples

For two independent samples denoted X_1, …, X_m \stackrel{i.i.d}{\sim} F and Y_1, …, Y_n \stackrel{i.i.d}{\sim} G, the two-sample U-statistic is given by

    \[ U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}). \]

where \phi is a kernel that is independently symmetric within the two blocks (X_{i_1}, ..., X_{i_a}) and (Y_{j_1}, ..., Y_{j_b}).

In the examples covered by this blog post, a = b = 1, reducing the U-statistic to,

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) .\]

The limiting variance of U is given by,

    \[ \text{Var}~U = \frac{a^2}{m} \sigma_{10}^2 + \frac{b^2}{n} \sigma_{01}^{2} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b),\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ].\]

Equivalently,

    \[ \sigma^{2}_{10} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …., Y_b) | X_1 \Big) \Big] = \text{Var}~\phi_{10}(X_1)\]

and

    \[ \sigma^{2}_{01} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …, Y_b) | Y_1 \Big) \Big] = \text{Var}~\phi_{01}(Y_1).\]

For a=b=1, these expressions reduce to

    \[ \text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{ \sigma_{01}^{2}}{n} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1; Y_1), \phi(X_1; Y'_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | X_1 \Big) \Big] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1; Y_1), \phi(X'_1; Y_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | Y_1 \Big) \Big] .\]

The limiting distribution of U_n for a=b=1 and N=n+m is then,

    \[\sqrt{N}(U_n - \theta) \rightarrow N\left(0, \frac{N}{m} \sigma_{10}^2 + \frac{N}{n} \sigma_{01}^{2} \right).\]

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

Examples of one-sample U-statistics

Variance

Suppose we have an independent and identically distributed random sample of size n, X_1, …, X_n \stackrel{i.i.d}{\sim} F.
We wish to estimate the variance, which can be expressed as an expectation functional,

    \[\sigma^{2} = \mathbb{E}_{F} \Big[( X -\mathbb{E}_F~X)^2\Big].\]

In order to estimate \sigma^2 using a U-statistic, we need to identify a kernel function that is unbiased for \sigma^2 and symmetric in its argument. We start by considering,

    \[\psi(x_1, x_2) = x_1^2 - x_1 x_2.\]

\psi is unbiased for \sigma^2 since

    \[\mathbb{E}_F \Big[X_1^2 - X_1X_2\Big] = \mathbb{E}_F~ X_1^2 - \mu^2 = \sigma^2\]

but is not symmetric since

    \[ \psi(x_1, x_2) = x_1^2 - x_1 x_2 \neq x_2^2 - x_1 x_2 .\]

Thus, the corresponding symmetric kernel can be constructed as

    \[ \phi(x_1, x_2) = \frac{1}{a!} \sum_{\pi \in \Pi} \psi(x_{\pi(1)}, x_{\pi(2)}).\]

Here, the number of arguments a = 2 and \Pi is the set of all permutations of the a=2 arguments,

    \[\Pi = \lbrace (x_1, x_2), (x_2, x_1) \rbrace. \]

Then, the symmetric kernel which is unbiased for the variance is,

    \[\phi(x_1, x_2) = \frac{1}{2} \Big[\phi(x_1, x_2) + \phi(x_2, x_1) \Big] = \frac{x_1^2 - 2 x_1 x_2 + x_2^2}{2} = \frac{(x_1 - x_2)^2}{2}.\]

An unbiased estimator of \sigma^2 is then the U-statistic,

    \[U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \frac{(X_{i} - X_{j})^2}{2}\]

or equivalently,

    \[U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \frac{(X_{i} - X_{j})^2}{2}. \]

Focusing on the second form of the sum and recognizing that

    \[\sum_{i \neq j} X_{i}^2 = \sum_{i \neq j} X_{j}^2\]

and,

    \[ \sum_{i \neq j}^{n} X_i = \sum_{i=1}^{n} \left( \sum_{j=1}^{n} X_j - X_i \right)\]

we have,

    \begin{align*} \sum_{i \neq j} (X_{i} - X_{j})^2 &= \sum_{i \neq j} X_{i}^2 - 2 X_{i} X_{j} + X_{j}^2 \\ &= 2 \Big( \sum_{i \neq j} X_{i}^2 - X_{i} X_{j} \Big) \\ &= 2 \left(\sum_{i=1}^{n} \left[ \sum_{j=1}^{n} X_j^{2} - X_i^2 \right] - \sum_{i=1}^{n} X_i \left[\sum_{j=1}^{n} X_j - X_i \right] \right) \\ &= 2 \left( n \sum_{j=1}^{n} X_j^{2} - \sum_{i=1}^{n} X_{i}^2 - n \bar{X} \sum_{i=1}^{n} X_i + \sum_{i=1}^{n} X_{i}^2 \right) \\ &= 2 n \sum_{i=1}^{n} X_{i}^2 - 2 n \bar{X} \sum_{i=1}^{n} X_i \\ &= 2 n \sum_{i=1}^{n} X_{i}^2 - 2 n^2 \bar{X}^2. \end{align*}

Plugging this simplified expression back into our formula for U_n, we obtain

    \begin{align*} U_n &= \frac{1}{n(n-1)} \left[n \sum_{i=1}^{n} X_{i}^2 - n^2 \bar{X}^2 \right] \\ &= \frac{1}{n-1} \left[ \sum_{i=1}^{n} X_{i}^2 - \frac{1}{n} \left( \sum_{i=1}^{n} X_i \right)^2 \right] \\ &= s_n^{2} \end{align*}

as desired.

It is well-known that s_{n}^2 is the unbiased estimator of the sample variance such that,

    \[\mathbb{E}_{F} ~U_n = \mathbb{E}_{F} ~s_n^{2} = \sigma^{2}\]

but what about the variance of U_n? For a sample size of n and a = 2,

    \[ \text{Var}_{F}~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c} {n-a \choose a-c} \sigma^{2}_{c} = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]. \]

To derive the first variance component \sigma^{2}_1, we start by taking the expectation of our kernel conditional on X_1,

    \begin{align*} \phi_1(X_1) &= \mathbb{E}_F \left[ \frac{(X_1 - X_2)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - x_1)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - \mu + \mu - x_1)^2}{2} \middle| X_1 \right] \\ &= \mathbb{E}_F \left[ \frac{(X_2 - \mu)^2 + 2 (X_2 - \mu)(x_1-\mu) + (\mu - x_1)^2}{2}\right] \\ &= \frac{\sigma^2}{2} + \frac{(x_1 - \mu)^2}{2}. \end{align*}

Now, our first variance component \sigma^{2}_1 is just equal to the variance of \phi_1(X_1) and since \frac{\sigma^2}{2} is just a constant, we have

    \begin{align*} \sigma_{1}^{2} &= \text{Var}_F~\phi_1(X_1) \\ &= \frac{1}{4}\text{Var}_F \left[ (X_1 - \mu)^2\right] \\ &= \frac{1}{4} \left( \mathbb{E}_F \left[ (X_1 - \mu)^4 \right] - \mathbb{E}_{F} \left[ (X_1 - \mu)^2 \right] \right) \\ &= \frac{\mu_4 - \sigma^4}{4} \end{align*}

where \mu_4 is the fourth central moment.

Next, recognizing that \phi_{a}(X_1, …, X_a) = \phi(x_1, …, x_a) and recycling our “add zero” trick yields an expression for our second variance component \sigma_{2}^2,

    \begin{align*} \sigma_{2}^{2} &= \text{Var}_{F}~ \phi(X_1, X_2) \\ &= \text{Var}_{F}~ \left[ \frac{(X_1 - X_2)^2}{2} \right] \\ &= \mathbb{E}_F \left[ \frac{(X_1 - \mu + \mu - X_2)^4}{4} \right] - \mathbb{E}_F \left[ \frac{(X_1 - X_2)^2}{2} \right]^2. \end{align*}

We know by definition that the kernel is an unbiased estimator of \sigma^{2} by definition so that,

    \[ \sigma_{2}^{2} = \mathbb{E}_F \left[ \frac{(X_1 - \mu + \mu - X_2)^4}{4} \right] - \sigma^{4} \]

To simplify the remaining expectation, recall that,

    \[ (a+b)^4 = a^2 + 4 a^3 b + 6 a^2 b^2 + 4 a b^3 + b^4\]

and let a = (X_1 - \mu) and b = (\mu - X_2). Then,

    \begin{align*} \mathbb{E}_{F} \left[ (a+b)^4 \right] &= \mathbb{E}_{F} \left[ (X_1 - \mu)^4 \right] + 6 \mathbb{E}_{F} \left[ (X_1 - \mu)^2(X_2 - \mu)^2 \right] + \mathbb{E}_{F} \left[ (X_2 - \mu)^4 \right] \\ &= 2 \mu_4 + 6 \sigma^{4}. \end{align*}

Substituting this back into our expression for \sigma_2^{2}, we have

    \[\sigma_{2}^{2} = \frac{2 \mu_4 + 6 \sigma^{4} - 4 \sigma^{4}}{4} = \frac{\mu_4 + \sigma^{4}}{2}.\]

Finally, plugging our two variance components into our expression for \text{Var}_{F}~U_n,

    \begin{align*} \text{Var}_{F}~U_n = \frac{2}{n(n-1)} \left[2(n-2) \left(\frac{\mu_4 - \sigma^{4}}{4}\right) + \frac{\mu_4 + \sigma^{4}}{2} \right] = \frac{\mu_4}{n} - \frac{\sigma^4(n-3)}{n(n-1)}. \end{align*}

Then, our asymptotic result for U_n tells us,

    \[\sqrt{n}(U_n - \sigma^{2}) \rightarrow N\left(0, \mu_4 - \frac{\sigma^4(n-3)}{(n-1)}\right).\]

Kendall’s Tau

Consider n bivariate, continuous observations of the form

    \[ (X_i, Y_i) \>\>,\>\>i = 1, …, n.\]

A pair of observations, \{(X_i, Y_i), (X_j, Y_j)\} is considered “concordant” if

    \begin{align*} (X_i > X_j \> &\cap \> Y_i > Y_j) \> \cup \> (X_i < X_j \> \cap \> Y_i < Y_j) \end{align*}

and “discordant” otherwise.

The probability that two observations are concordant is then,

    \[ c = P(X_i < X_j, Y_i < Y_j) + P(X_i > X_j, Y_i > Y_j) \]

and the probability that two observations are discordant is then,

    \[ d = 1 - c.\]

Kendall’s Tau, denoted \tau, is the proportion of concordant pairs minus the proportion of discordant pairs, or the difference between c and d such that,

    \begin{align*} \tau &= c - (1-c) \\ &= 2c - 1 \\ &= 2 \left[ P(X_i < X_j, Y_i < Y_j) + P(X_i > X_j, Y_i > Y_j) \right] - 1. \end{align*}

\tau ranges between -1 and 1 and is used as a measure of the strength of monotone increasing/decreasing relationships, with \tau = 0 suggesting that X and Y are independent and \tau = 1 suggesting a perfect monotonic increasing relationship between X and Y.

Based on our definition of \tau, the form of the symmetric kernel is immediately obvious,

    \[\phi((x_i, y_i), (x_j, y_j)) = 2 \left[ \mathbb{I}(x_i < x_j, y_i < y_j) + \mathbb{I}(x_i > x_j, y_i > y_j) \right] - 1.\]

where \mathbb{I}(\cdot) is an indicator function taking the value 1 when its argument is true and 0 otherwise.

Note that

    \[ \mathbb{I}(x_i < x_j, y_i < y_j) \equiv \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j) \]

and

    \[ \mathbb{I}(x_i > x_j, y_i > y_j) \equiv [1-\mathbb{I}(x_i < x_j)][1- \mathbb{I}(y_i < y_j)] \]

so that our kernel may be re-expressed as,

    \begin{align*} \phi((x_i, y_i), (x_j, y_j)) &= 2 \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j) + 2[1-\mathbb{I}(x_i < x_j)][1- \mathbb{I}(y_i < y_j)] - 1 \\ &= 4 \mathbb{I}(x_i < x_j) \mathbb{I}(y_i < y_j) - 2 \mathbb{I}(x_i < x_j) - 2\mathbb{I}(y_i < y_j) + 1 \\ &= [2\mathbb{I}(x_i < x_j) - 1][2\mathbb{I}(y_i < y_j)-1] \\ &= [1-2\mathbb{I}(x_j < x_i)][1-2\mathbb{I}(y_j < y_i)]. \end{align*}

This will come in handy later.

Now that we have identified our kernel function, we can construct our U-statistic,

    \[U_n = \frac{1}{{n \choose 2}} \sum_{i < j} 2 \left[ \mathbb{I}(x_i < x_j, y_i < y_j) + \mathbb{I}(x_i > x_j, y_i > y_j) \right] - 1.\]

It is obvious that \mathbb{E}~U_n = \tau. Once again, a=2 and the variance of U_n is given by,

    \[ \text{Var}_{F}~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c} {n-a \choose a-c} \sigma^{2}_{c} = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]. \]

For the purposes of demonstration and to simplify derivation of the variance components, suppose we are operating under the null hypothesis that X and Y are independent, or equivalently

    \[ H_0:~ \tau = 0 .\]

To find our first variance component \sigma^{2}_1, we must find the expectation of our kernel conditional on (X_1, Y_1),

    \begin{align*} \phi_1((X_1, Y_1)) &= \mathbb{E} \left[ \phi((X_1, Y_1), (X_2, Y_2)) \middle| (X_1, Y_1) \right] \\ &= \mathbb{E} \Big[ [1-2\mathbb{I}(X_2 < x_1)][1-2\mathbb{I}(Y_2 < y_1)] \Big]. \end{align*}

If X \sim F and Y \sim G, then \mathbb{E}[\mathbb{I}(X < x)] = F(x) and,

    \[\phi_1((X_1, Y_1)) = (1 - 2 F(x_1))(1 - 2 G(y_1))\]

.

Then, the first variance component is given by,

    \[ \sigma_1^{2} = \text{Var}[\phi_1((X_1, Y_1))] = \text{Var} \Big[(1 - 2 F(X_1))(1 - 2 G(X_2)). \Big] \]

F(X_1) and G(Y_1) are independent random variables distributed according to \text{Unif}(0,1).

If U \sim \text{Unif}(0,1) then 1-U \sim \text{Unif}(0,1). Thus, if we let U = 1 - 2 F(X_1) and V = 1 - 2 G(Y_1), U and V are both distributed according to \text{Unif}(-1, 1).

Since U and V are independent, applying the identity \text{Var}~Z = E[Z^2] - E[Z]^2 yields,

    \[ \sigma_1^{2} = \mathbb{E}[U^2]~ \mathbb{E}[V^2] - \mathbb{E}[U]^2~ \mathbb{E}[V]^2 \]

Recall that if Z \sim \text{Unif}(a, b),

    \[ \mathbb{E}~Z = \frac{a+b}{2} \> \>,\>\> \text{Var}~Z = \frac{(b-a)^2}{12}. \]

For a = -1 and b=1, we have

    \[ \mathbb{E}[U] = 0 \]

and

    \[ \mathbb{E}[U^2] = \text{Var}[U] = \frac{2^2}{12} = \frac{1}{3}.\]

The same is true for V.

Plugging our results back into our equation for \sigma_1^{2} yields,

    \[ \sigma_1^{2} = \left(\frac{1}{3}\right)^2 = \frac{1}{9} .\]

Next, \phi_2((X_1, Y_1), (X_2, Y_2)) = \phi((x_1, y_1), (x_2, y_2)) and,

    \[ \sigma_2^{2} = \text{Var} [\phi_2] = \mathbb{E}[\phi_2^{2}] - \mathbb{E}[\phi_2]^2 .\]

By definition, \mathbb{E}[\phi_2] = \mathbb{E}[\phi((X_1, Y_1), (X_2, Y_2))] = \tau so that,

    \[ \sigma_2^{2} = \text{Var} [\phi_2] = \mathbb{E}\Big[\Big(1-2\mathbb{I}(X_2 < X_1)\Big)^2\Big] \mathbb{E}\Big[\Big(1-2\mathbb{I}(Y_2 < Y_1)\Big)^2\Big] - \tau^2 .\]

Note that since X_1 and X_2 are identically distributed and continuous, either X_1 < X_2 or X_1 > X_2, so that

    \[\mathbb{I}(X_i < X_j) \sim \text{Bernoulli}\left(p = \frac{1}{2}\right)\]

.

Then we can use the properties of the Bernoulli distribution to derive the properties of \mathbb{I}(X_2 < X_1) we need. That is,

    \[\mathbb{E}[\mathbb{I}(X_2 < X_1)] = \frac{1}{2},\]

    \[\text{Var}[\mathbb{I}(X_2 < X_1)] = \left(\frac{1}{2}\right)^2 = \frac{1}{4},\]

and

    \[ \mathbb{E}[\mathbb{I}(X_2 < X_1)^2] = \frac{1}{4} + \left(\frac{1}{2}\right)^2 = \frac{1}{2}.\]

Finally, we have

    \[ \mathbb{E}\left[\Big(1-2\mathbb{I}(X_2 < X_1)\Big)^2\right] = 1 - 4 \left(\frac{1}{2}\right) + 4 \left(\frac{1}{2}\right) = 1.\]

The same arguments hold for \mathbb{I(Y_2 < Y_1)} and we obtain,

    \[ \sigma_2^{2} = 1 - \tau^2 .\]

However, since \tau = 0 under the null hypothesis, \sigma^{2}_2 = 1.

Now that we have determined the value of \sigma_1^{2} and \sigma_2^{2} under the null hypothesis that X and Y are independent, we can plug these components into our formula for \text{Var}~U_n, giving us

    \[ \text{Var}~U_n = \frac{2}{n(n-1)} \left[\frac{2(n-2)}{9} + 1 \right] = \frac{2}{n(n-1)}\left[\frac{2n + 5}{9}\right]. \]

Our asymptotic result for U_n tells us,

    \[ \sqrt{n}(U_n - \tau) \rightarrow N\left(0, \frac{2}{(n-1)}\left[\frac{2n + 5}{9}\right]\right) .\]

Examples of two-sample U-statistics

Mean comparison

Suppose we have two independent random samples of size m and size n,

    \[X_1, …, X_m \stackrel{i.i.d}{\sim} F\]

and

    \[Y_1, …, Y_n \stackrel{i.i.d}{\sim} G.\]

We wish to compare the means of the two groups. The obvious choice for our kernel is,

    \[ \phi(x_i; y_j) = x_i - y_j \]

so that a = b = 1 and our corresponding U-statistic is,

    \[ U= \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} (X_i - Y_j) = \bar{X}_m - \bar{Y}_n. \]

Based on our previous derivation of the distribution of two-sample U-statistics, we have

    \[ \text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{\sigma_{01}^{2}}{n} .\]

For the first variance component, we need to take the expectation of \phi conditional on a single X_i such that,

    \[ \phi_{10}(X_1) = \mathbb{E} \left[X_1 - Y_1 | X_1 \right] = x_1 - \mathbb{E}[Y_1] = x_1 - \mu_{Y}.\]

Similarly, for the second variance component, we need to condition on a single Y_i such that,

    \[ \phi_{01}(Y_1) = \mathbb{E} \left[X_1 - Y_1 | Y_1 \right] = \mathbb{E}[X_1] - y_1 = \mu_{X} - y_1.\]

Since \mu_X and \mu_Y are just constants, it is easy to see that,

    \[\sigma_{10}^2 = \text{Var}~X_1 = \sigma^{2}_X\]

and,

    \[\sigma_{01}^2 = \text{Var}~Y_1 = \sigma^{2}_Y.\]

Finally, plugging these variance components into our formula for \text{Var}~U, we obtain the variance we would expect for a comparison of two means,

    \[ \text{Var}~U = \frac{\sigma^{2}_X}{m} + \frac{\sigma^{2}_Y}{n} .\]

Wilcoxon Mann-Whitney rank-sum test

Suppose we have two independent random samples of size m and size n,

    \[X_1, …, X_m \stackrel{i.i.d}{\sim} F\]

and

    \[Y_1, …, Y_n \stackrel{i.i.d}{\sim} G.\]

We assume that X and Y are continuous so that no tied values are possible. Let R_1, …, R_m rpresent the full-sample ranks of the X_i and S_1, …, S_n represent the ranks of the Y_j.

Then, the Wilcoxon Mann-Whitney (WMW) rank-sum statistic is,

    \[W_{XY} = \sum_{j=1}^{n} S_j - \frac{1}{2} n(n+1)\]

which can be shown to be equivalent to the number of pairs (X_i, Y_j) for which X_i < Y_j. That is, we can re-express the WMW statistic as,

    \[ W_{XY} = \sum_{i=1}^{m} \sum_{j=1}^{n} \mathbb{I}(X_i < Y_j) .\]

If we divide W_{XY} by the total number of (X_i, Y_j) pairs, we obtain

    \[\frac{1}{mn} W_{XY} = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \mathbb{I}(X_i < Y_j) \]

which is exactly the form of a two-sample U-statistic with a = b = 1 and \phi(X_i; Y_j) = \mathbb{I}(X_i < Y_j),

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_{i}; Y_{j}) \]

so that \mathbb{E}~U = P(X < Y) = \theta. \theta is commonly referred to as the probabilistic index.

For more information on the probabilistic index for two continuous outcomes, check out The probabilistic index for two normally distributed outcomes.

Our previous work tells us that

    \[ \text{Var}~U = \frac{\sigma^{2}_{01}}{m} + \frac{\sigma^{2}_{10}}{n}.\]

The first variance component \sigma_{01}^2 can be expressed as,

    \[\sigma_{10}^2 = \text{Cov}[\phi(X; Y), \phi(X; Y')] = \text{Cov}[\mathbb{I}(X < Y), \mathbb{I}(X < Y')].\]

Recall that covariance can be expressed in terms of expectation as,

    \[ \text{Cov}[X, Y] = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] \]

so that,

    \[ \sigma_{10}^2 = \mathbb{E} \left[ \mathbb{I}(X < Y) \mathbb{I}(X < Y') \right] - \mathbb{E}[\mathbb{I}(X < Y)] \mathbb{E}[ \mathbb{I}(X < Y')].\]

By definition,

    \[ \mathbb{E}[\mathbb{I}(X < Y)] = \mathbb{E}[ \mathbb{I}(X < Y')] = \theta .\]

Now, notice that

    \[ \mathbb{I}(X < Y) \mathbb{I}(X < Y') = \begin{cases} 1 & (X < Y) \cap (X < Y') \\ 0 & \text{o.w.} \end{cases} \]

so that,

    \[ \mathbb{E} \left[ \mathbb{I}(X < Y) \mathbb{I}(X < Y') \right] = P(X < Y \cap X < Y').\]

Following similar logic for \sigma_{01}^2, it should be clear that we have

    \[ \sigma_{10}^2 = P(X < Y \cap X < Y') - P(X < Y)^2\]

and

    \[ \sigma_{01}^2 = P(X < Y \cap X' < Y) - P(X < Y)^2.\]

Under the null hypothesis H_0: F = G, X and Y have the same (continuous) distribution so that either X > Y or Y < X, implying P(X < Y) = \frac{1}{2} under H_0.

Similarly, there are 6 equally likely orderings of X, X', and Y under H_0: (1) X < X' < Y, (2) X' < X < Y, (3) Y < X < X', (4) X < Y < X', (5) X' < Y < X, and (6) Y < X' < X. Then,

    \[P(X < Y \cap X' < Y) = P(X < X' < Y) + P(X' < X < Y) = \frac{2}{6} = \frac{1}{3} .\]

Noting that P(X < Y \cap X' < Y) = P(X < Y \cap X < Y'), plugging these values into our expressions for \sigma_{10}^2 and \sigma_{01}^2 gives us,

    \[\sigma_{10}^2 = \sigma_{01}^2 = \frac{1}{3} - \left(\frac{1}{2}\right)^2 = \frac{1}{3} - \frac{1}{4} = \frac{1}{12}.\]

Finally,

    \[\text{Var}~U = \frac{1}{m} \left( \frac{1}{12} \right) + \frac{1}{n} \left( \frac{1}{12} \right) = \frac{n+m}{12mn} = \frac{N}{12mn}.\]

Consequently, since W_{XY} = mn U, we have

    \[\text{Var}~ W_{XY} = \frac{1}{12} nmN.\]

In summary, our multiple-sample U-statistic theory tells us that under the null hypothesis H_0: F = G,

    \[ \mathbb{E}~W_{XY} = \frac{1}{2} \]

and

    \[ \text{Var}~ W_{XY} = \frac{1}{12} nmN.\]


Click here to download this blog post as an RMarkdown (.Rmd) file!

Published by

Emma Smith

Emma Smith is a young statistician who's on a mission to convince the masses statistics is as awesome as she *knows* it is! When she's not working on expanding her knowledge of machine learning and mathematical statistics, she's busy petting cats and unsuccessfully convincing her boyfriend to let her adopt them, hiking, concocting indie and folk rock playlists, and kicking butt in roller derby.

One thought on “One, Two, U: Examples of common one- and two-sample U-statistics”

Leave a Reply

Your email address will not be published. Required fields are marked *