Getting to know U: the asymptotic distribution of a single U-statistic

After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.

Overview

A statistical functional is any real-valued function of a distribution function $F$ such that

$\theta = T(F)$

and represents characteristics of the distribution $F$ and include the mean, variance, and quantiles.

Often times $F$ is unknown but is assumed to belong to a broad class of distribution functions $\mathcal{F}$ subject only to mild restrictions such as continuity or existence of specific moments.

A random sample $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ can be used to construct the empirical cumulative distribution function (ECDF) $\hat{F}_n$ ,

$\hat{F}_{n}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x)$

which assigns mass $\frac{1}{n}$ to each $X_i$ .

$\hat{F}_{n}$ is a valid, discrete CDF which can be substituted for $F$ to obtain $\hat{\theta} = T(\hat{F}_n)$ . These estimators are referred to as plug-in estimators for obvious reasons.

For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!

Many statistical functionals take the form of an expectation of a real-valued function $\phi$ with respect to $F$ such that for $a \leq n$ ,

$\theta = T(F) = \mathbb{E}_{F}~ \phi(X_1, …, X_a) .$

When $\phi(x_1, …, x_a)$ is a function symmetric in its arguments such that, for e.g. $\phi(x_1, x_2) = \phi(x_2, x_1)$ , it is referred to as a symmetric kernel of degree $a$ . If $\phi$ is not symmetric, a symmetric equivalent $\phi^{*}$ can always be found,

$\phi^{*}(x_1, …, x_a) = \frac{1}{a!} \sum_{\pi ~\in~ \Pi} \phi(x_{\pi(1)}, …, x_{\pi(a)})$

where $\Pi$ represents the set of all permutations of the indices $1, …, a$ .

A statistical functional $\theta = T(F)$ belongs to a special family of expectation functionals when:

$T(F) = \mathbb{E}_F ~\phi(X_1, …, X_a)$ , and
$\phi(X_1, …, X_a)$ is a symmetric kernel of degree $a$ .

Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,

$V_n = \frac{1}{n^a} \sum_{i_1 = 1}^{n} … \sum_{i_a = 1}^{n} \phi(X_{i_1}, …, X_{i_a})$

so that $V_n$ is the average of $\phi$ evaluated at all possible permutations of size $a$ from $X_1, …, X_n$ . Since the $X_i$ can appear more than once within each summand, $V_n$ is generally biased.

By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions $\mathcal{F}$ is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.

Since $\phi$ is symmetric, we can require that $1 \leq i_1 < ... < i_a \leq n$ , resulting in ${n \choose a}$ combinations of the subscripts $1, ..., a$ . The U-statistic is then the average of $\phi$ evaluated at all ${n \choose a}$ distinct combinations of $X_1, ..., X_n$ ,

$U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a}).$

While $i_j \neq i_k$ within each summand now, each $X_i$ still appears in multiple summands, suggesting that $U_n$ is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of $U_n$ .

For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!

This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic $U_n$ .

Variance derivation

We must start by assuming the form of the family of distributions $\mathcal{F}$ to which $F$ can belong. It suffices that $\mathcal{F}$ is the family of all distributions for which,

$\text{Var}_{F}~ \phi(X_1, …, X_a) < \infty$

such that $\text{Var}_{F} ~ U_n$ exists. Then, since $\text{Var}_F~ [aX] = a^2 ~\text{Var}_F ~[X]$ , we have

$\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \text{Var}_{F} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a}) .$

Equivalently, let $\mathcal{C}$ represent the set of all combinations of the subscripts $(1, ..., a)$ , then

$\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \text{Var}_{F} \sum_{\mathbf{i} ~\in~ \mathcal{C}} \phi(X_{i_1}, ..., X_{i_a}) .$

Recalling that

$\text{Var}_{F} \sum_{i=1}^{n} a_i Z_i = \sum_{i=1}^{n} \sum_{j=1}^{n} a_i a_j ~\text{Cov}[Z_i, Z_j]$

we can re-express $\text{Var}_{F} ~ U_n$ as,

$\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \sum_{\mathbf{i} ~\in~ \mathcal{C}} \sum_{\mathbf{j} ~\in~ \mathcal{C}} \text{Cov}\left[\phi(X_{i_1}, …, X_{i_a}),~ \phi(X_{j_1}, …, X_{j_a})\right].$

Let’s focus attention on a single summand, $\text{Cov}\left[\phi(X_{i_1}, …, X_{i_a}),~ \phi(X_{j_1}, …, X_{j_a}) \right].$

Let $c$ represent the common elements between $(i_1, …, i_a)$ and $(j_1, …, j_a)$ . Then, $0 \leq c \leq a$ as $(i_1, …, i_a)$ and $(j_1, …, j_a)$ can be identical, such that they share all $a$ elements, or completely distinct, such that they share no elements, or anything in between.

Recall that

$\text{Cov}[X, Y] = \mathbb{E}_{F} \left[(X - \mathbb{E}_F~X) (Y - \mathbb{E}_F~Y)\right]$

and by definition,

$\mathbb{E}_{F}~ \phi(X_{i_1}, …, X_{i_a}) = \theta .$

Now, to simplify notation, let $(X_1, …, X_a)$ represent the elements $(X_{i_1}, …, X_{i_a})$ and $(X'_{1}, …, X'_{a})$ represent the elements $(X_{j_1}, …, X_{j_a})$ . With $c$ common elements,

$\text{Cov}\left[\phi(X_{i_1}, …, X_{i_a}),~ \phi(X_{j_1}, …, X_{j_a})\right] = \mathbb{E}_{F} \left[ \left\{\phi(X_1, …, X_c, X_{c+1}, …, X_a) - \theta \right\}\left\{\phi(X_1, …, X_c, X'_{c+1}, …, X'_a) - \theta \right\} \right]$

To simplify this statement, we need to combine some clever conditioning with the all powerful law of iterated expectation (a.k.a law of total expectation).
Conditioning on the $c$ common elements $X_1, …, X_c$ will make the two terms within the expectation independent as they share no other elements.

For $c = 0, 1, …, a$ , define

$\phi_{c}(X_1, …, X_c) = \mathbb{E}_F \left[ \phi(X_1, …, X_a) | X_1, …, X_c \right].$

Then, applying the law of iterated expectation,

$\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[ \mathbb{E}_F\left[\phi(X_1, …, X_a) | X_1, …, X_c\right]\Big] = \mathbb{E}_F~ \phi(X_1, …, X_a) = \theta.$

We also define $\sigma_{c}^2 = \text{Var}_{F}~ \phi_{c} (X_1, …, X_c)$ .

Note that when $c=0$ ,

$\phi_0 = \mathbb{E}_F~ \phi(X_1, …, X_a) = \theta$

such that $\sigma^{2}_{0} = 0$ and when $c=a$ ,

$\phi_a = \mathbb{E}_F \Big[ \mathbb{E}_F\left[\phi(X_1, …, X_a) | X_1, …, X_a \right]\Big] = \phi(X_1, …, X_a)$

such that $\sigma_{a}^2 = \text{Var}_{F}~ \phi(X_1, …, X_a)$ .

Let $S_1 = (X_1, …, X_c, X_{c+1}, …, X_a)$ and $S_2 = (X_1, …, X_c, X'_{c+1}, …, X'_{a})$ . Then, the law of iterated expectation tells us,

$\text{Cov}[\phi(S_1), \phi(S_2)] = \mathbb{E}_F \Big[ \mathbb{E}_F \big[ (\phi(S_1) - \theta)(\phi(S_2) - \theta) | X_1, …, X_c\big] \Big].$

Since $\phi(S_1)$ and $\phi(S_2)$ become independent when we condition on $X_1, …, X_c$ , we have

$\text{Cov}[\phi(S_1), \phi(S_2)] = \mathbb{E}_{F} \Big[ \mathbb{E}_F \left[ \phi(S_1) - \theta | X_1, …., X_c \right] \mathbb{E}_F \left[ \phi(S_2) - \theta | X_1, …., X_c \right] \Big]$

and since $\theta$ is just a constant,

$\text{Cov}[\phi(S_1), \phi(S_2)] = \mathbb{E}_{F} \Big[ \Big(\mathbb{E}_F \left[ \phi(S_1) | X_1, …., X_c \right] - \theta \Big) \Big(\mathbb{E}_F \left[ \phi(S_2) | X_1, …., X_c \right] - \theta \Big)\Big].$

Noting that $\mathbb{E}_F \left[ \phi(S_1) | X_1, …., X_c \right] = \mathbb{E}_F \left[ \phi(S_2) | X_1, …., X_c \right] = \phi_{c}(X_1, …, X_c)$ ,

$\text{Cov}[\phi(S_1), \phi(S_2)] = \mathbb{E}_{F} \Big[ \Big( \phi_{c}(X_1, …, X_c) - \theta \Big)^2\Big] = \text{Var}_F~ \phi_c (X_1, …, X_c) = \sigma^2_c.$

How many $(X_{i_1}, …, X_{i_a})$ and $(X_{j_1}, …, X_{j_a})$ have $c$ elements in common?

There are ${n \choose a}$ ways to select the indices $(i_{1}, …, i_{a})$ .
Then, there are ${a \choose c}$ ways of selecting the $c$ elements from $(i_{1}, …, i_{a})$ shared by $(j_{1}, …, j_{a})$ .
Finally, there are ${n-a \choose a-c}$ ways of selecting the remaining $a-c$ elements of $(j_{1}, …, j_{a})$ from the remaining $n-a$ elements available.

Combining all of these components together, we obtain

$\text{Var}_{F}~U_n = \frac{1}{{n \choose a}^2} \sum_{c=0}^{a} {n \choose a}{a \choose c}{n-a \choose a-c} \sigma^{2}_c$

but since $\sigma^{2}_0 = 0$ , simplifying yields,

$\text{Var}_{F}~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.$

Note that for large $n$ ,

${n-a \choose k} = \frac{(n-a)!}{k! (n-a-k)!} = \frac{1}{k!} (n-a)(n-a-1)…(n-a-k+1) \sim \frac{n^k}{k!}.$

Then, we can rewrite $\text{Var}_{F}~U_n$ as,

$\text{Var}_F ~ U_n = \frac{a! (n-a)!}{n!} \sum_{c=1}^{a} \frac{a!}{c!(a-c)!} \frac{n^{a-c}}{(a-c)!} \sigma^{2}_c.$

Expanding out the first few terms,

$\text{Var}_F ~ U_n = \frac{a! (n-a)!}{n!} \left[ \frac{a!}{(a-1)!} \frac{n^{a-1}}{(a-1)!} \sigma^{2}_1 + \frac{a!}{2!(a-2)!} \frac{n^{a-2}}{(a-2)!} \sigma^{2}_2 + … \right].$

Since $a!/(a-1)! = a$ and $n! / (n-a)! \sim n^{a}$ , simplifying yields

$\text{Var}_F ~ U_n = \frac{a^2}{n} \sigma^{2}_1 + \frac{a^2 (a-1)^2}{2 n^2} \sigma^{2}_2 + … ~.$

The first term of the variance dominates and we have,

$\text{Var}_F \approx \frac{a^2}{n} \sigma^{2}_1$

where

$\sigma^{2}_1 = \text{Var}_F~ \phi_1(X_1) = \text{Var}_F \Big[ \mathbb{E}_F \left[ \phi(X_1, …, X_a) | X_1 \right]\Big].$

So, now we know that $\mathbb{E}_F~U_n = \theta$ and $\text{Var}_F~U_n = \frac{a^2}{n} \sigma^{2}_1$ , but we want to know if

$\sqrt{n} (U_n - \theta) \rightarrow N \left(0, a^2 \sigma^{2}_1 \right).$

Again, we cannot use the central limit theorem (CLT) to show that $U_n$ is asymptotically normal since it is the sum of dependent terms. However, we can:

Construct a surrogate $U^{*}_n$ for which the CLT does apply, and
Demonstrate that $U_n$ converges to the same distribution as $U^{*}_n$ .

Selection and distribution of a surrogate

What should we choose as the surrogate? Well, we want something that will have the same mean and variance as $U_n - \theta$ . The variance only involves $\sigma^{2}_1$ which suggests conditioning on a single $X_i$ . Thus, as a surrogate, lets consider

$U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F \left[ U_n - \theta | X_i \right]$

Let’s start by expanding the summand,

$\mathbb{E}_F \left[U_n - \theta | X_i \right] = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \Big( \mathbb{E}_F \left[\phi(X_{i_1}, ..., X_{i_a})|X_i\right] - \theta \Big)$

If $X_i \in (X_{i1}, …, X_{i_a})$ then $c = 1$ and,

$\mathbb{E}_F [\phi(X_{i_1}, …, X_{i_a}|X_i] - \theta = \phi_1(X_i) - \theta$

else if $X_i \notin (X_{i1}, …, X_{i_a})$ then $c = 0$ and,

$\mathbb{E}_F [\phi(X_{i_1}, …, X_{i_a})|X_i] - \theta = \mathbb{E}_F [\phi(X_{i_1}, …, X_{i_a})] - \theta=0.$

Of the ${n \choose a}$ terms, how many include $X_i$ ? If $i \in (i_1, …, i_a)$ , there are $n-1$ possible choices for the remaining subscripts, of which we require $a-1$ . Thus we have,

$U_n^{*} = \frac{{n-1 \choose a-1}}{{n \choose a}} \sum_{i=1}^{n} \Big[ \phi_1(X_i) - \theta \Big]= \frac{a}{n} \sum_{i=1}^{n} \Big[ \phi_1(X_i) - \theta \Big].$

The $\phi_1(X_i)$ are independent and identically distributed with mean $\theta$ and variance $\sigma^{2}_1$ . The expectation of $U_n^{*}$ with respect to $F$ is,

$\mathbb{E}_F~U_n^{*} = a ~\mathbb{E}_F \Big[ \phi_1(X_i) - \theta \Big] = 0$

and the corresponding variance is,

$\text{Var}_F~ U_n^{*} = \frac{a^2}{n^2} \sum_{i=1}^{n} \text{Var}_F~ \phi_1(X_i) = \frac{a^2}{n} \sigma_1^{2}.$

Finally, $U_n^{*}$ is the sum of independent and identically distributed random variables and so the central limit theorem tells us,

$\sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).$

Convergence to surrogate

We can express our quantity of interest $\sqrt{n} (U_n - \theta)$ in terms of our surrogate $U^{*}_n$ as,

$\sqrt{n} (U_n - \theta) = \sqrt{n} U_{n}^{*} + \sqrt{n} (U_n - \theta - U_n^{*}).$

Thus if we can show that,

$\sqrt{n} (U_n - \theta - U_n^{*}) \stackrel{P}{\rightarrow} 0$

we have our desired result! To prove this, it is sufficient to demonstrate

$\mathbb{E}_F~ \left[ \sqrt{n} (U_n - \theta - U_n^{*}) \right]^2 \rightarrow 0.$

So expanding the quadratic mean, we have

$\mathbb{E}_F~ \left[ \sqrt{n} (U_n - \theta - U_n^{*}) \right]^2 = n \mathbb{E}_F~ {U_n^{*}}^{2} - 2n \mathbb{E}_F \Big[(U_n - \theta) U_n^{*} \Big] + n \mathbb{E}_F~ (U_n - \theta)^2.$

Since $\mathbb{E}_F~ U_n^{*} = \mathbb{E}_F~ (U_n - \theta)^2$ , $Var[X] = E[X^2] - E[X]^2$ , and $\theta$ is a constant, this simplifies to

$\mathbb{E}_F~ \left[ \sqrt{n} (U_n - \theta - U_n^{*}) \right]^2 = n \text{Var}_F ~{U_n^{*}} - 2n \text{Cov} \Big[U_n - \theta, U_n^{*} \Big] + n \text{Var}_F~ U_n.$

We know $\text{Var}_F ~U_n^{*}$ and $\text{Var}_F~ U_n$ from our earlier work, both of which equal $\frac{a^2}{n}\sigma^{2}_1$ . We can recycle our tricks from earlier to figure out $\text{Cov}[U_n-\theta, U_n^{*}]$ .

Expanding the covariance yields,

$\text{Cov}[U_n, U_n^{*}] = \mathbb{E}_F [(U_n - \theta) U_n] = \frac{a}{n} \sum_{i=1}^{n} \mathbb{E}_F \Big[ \Big(U_n - \theta \Big) \Big( \phi_1(X_i) - \theta\Big) \Big].$

Conditioning on $X_i$ and applying the law of iterated expectation,

$\text{Cov}[U_n, U_n^{*}] = \frac{a}{n} \sum_{i=1}^{n} \mathbb{E}_F \Big[ \mathbb{E}_F \left[\Big(\phi_1(X_i) - \theta\Big) \Big(U_n - \theta\Big)\Big| X_i \right]\Big] = \frac{a}{n} \sum_{i=1}^{n} \mathbb{E}_F \Big[ \Big(\phi_1(X_i) - \theta\Big)\mathbb{E}_F \left[ \Big(U_n - \theta\Big)\Big| X_i \right]\Big].$

Plugging in our previous results yields,

$\text{Cov}[U_n, U_n^{*}] = \frac{a^2}{n^2} \sum_{i=1}^{n} \mathbb{E}_F ~ \Big[\phi_1(X_i) - \theta)\Big]^2 = \frac{a^2}{n^2} \sum_{i=1}^{n} \text{Var}_F~ \phi_1(X_i) = \frac{a^2}{n} \sigma^{2}_1.$

Finally, for large $n$ , we have

$\mathbb{E}_F~ \left[ \sqrt{n} (U_n - \theta - U_n^{*}) \right]^2 = n \left( \frac{a^2}{n} \sigma^{2}_1 \right) - 2n \left( \frac{a^2}{n} \sigma^{2}_1 \right) + n \left( \frac{a^2}{n} \sigma^{2}_1 \right) = 0.$

Huzzah! Finally we have shown that a single U-statistic $U_n$ is asymptotically normally distributed with mean $\theta = T(F)$ and variance $\frac{a^2}{n} \sigma^{2}_1$ ,

$\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^{2}).$

Click here to download this blog post as an RMarkdown (.Rmd) file!

Overview

Variance derivation

Selection and distribution of a surrogate

Convergence to surrogate

Published by

Emma Davies Smith

3 thoughts on “Getting to know U: the asymptotic distribution of a single U-statistic”

Leave a Reply Cancel reply