Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Season 2 Crying GIF by Pose FX

Review

Statistical functionals are any real-valued function of a distribution function $F$ , $\theta = T(F)$ . When $F$ is unknown, nonparametric estimation only requires that $F$ belong to a broad class of distribution functions $\mathcal{F}$ , typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size $n$ , $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ , a statistical functional $\theta = T(F)$ is said to belong to the family of expectation functionals if:

$T(F)$ takes the form of an expectation of a function $\phi$ with respect to $F$ ,
$T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a)$
$\phi(X_1, …, X_a)$ is a symmetric kernel of degree $a \leq n$ .

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree $a = 2$ , $\phi$ is symmetric if $\phi(x_1, x_2) = \phi(x_2, x_1)$ .

If $\theta = T(F)$ is an expecation functional and the class of distribution functions $\mathcal{F}$ is broad enough, an unbiased estimator of $\theta = T(F)$ can always be constructed. This estimator is known as a U-statistic and takes the form,

$U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})$

such that $U_n$ is the average of $\phi$ evaluated at all ${n \choose a}$ distinct combinations of size $a$ from $X_1, …, X_n$ .

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

Since each $X_i$ appears in more than one summand of $U_n$ , the central limit theorem cannot be used to derive the limiting distribution of $U_n$ as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that $U_n$ is in fact asymptotically normal with mean

$\mathbb{E}_F~ U_n = \theta = T(F)$

and variance

$\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}$

where

$\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].$

The sketch of the proof is as follows:

Express the variance of $U_n$ in terms of the covariance of its summands,

$\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].$

Recognize that if two terms share $c$ common elements such that,
$\text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)]$

conditioning on their $c$ shared elements will make the two terms independent.
For $0 \leq c \leq n$ , define
$\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big]$

such that

$\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)$

and

$\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).$

Note that when $c = 0$ , $\phi_0 = \theta$ and $\sigma_0^2 = 0$ , and when $c=a$ , $\phi_a = \phi(X_1, …, X_a)$ and $\sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a)$ .
Use the law of iterated expecation to demonstrate that
$\sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)]$

and re-express $\text{Var}_{F}~U_n$ as the sum of the $\sigma_{c}^2$ ,

$\text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.$

Recognizing that the first variance term dominates for large $n$ , approximate $\text{Var}_F~ U_n$ as

$\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.$
Identify a surrogate $U^{*}_n$ that has the same mean and variance as $U_n-\theta$ but is the sum of independent terms,
$U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i]$

so that the central limit may be used to show

$\sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).$
Demonstrate that $U_n - \theta$ and $U_n^{*}$ converge in probability,
$\sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0$

and thus have the same limiting distribution so that

$\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).$

For a walkthrough derivation of the limiting distribution of $U_n$ for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

Definitions for multiple independent samples

Consider $s$ independent random samples denoted

$X_{i1}, …, X_{i{n_i}} \stackrel{i.i.d}{\sim} F_i ~,~ i = 1, …, s$

where $n_i$ is the size of the $i^{th}$ sample and $N = n_1 + … + n_s$ is the total sample size. Let $F = (F_1, …, F_s)$ . Then, for multiple independent samples, a statistical functional $\theta = T(F)$ is an expectation functional if

$\theta = T(F) = \mathbb{E}_F~ \phi(X_{11}, …, X_{1{a_1}}; … ; X_{s_1}, …, X_{s{a_s}})$

where the kernel $\phi$ is a function symmetric within each of the $s$ blocks of arguments $\{(X_{11}, …, X_{1{a_1}})$ , …, $(X_{s1}, …, X_{s{a_s}})\}$ . That is, each block of arguments within $\phi$ can be permuted independently without changing the value of $\phi$ . By this definition, $\phi$ is an unbiased estimator of $\theta$ .

In the single sample case, we were able to permute all $a$ arguments of $\phi(X_1, …, X_a)$ without impacting $\phi$ ‘s value as each $X_i$ was identically distributed and thus “exchangeable”. Here, all $a$ arguments are not identically distributed.

The first sample may be distributed according to $F_1$ , the second according to $F_2$ , and so on. We have not required that $F_1 = F_2 = … = F_s$ and so we cannot assume all $a$ arguments are exchangeable in the multiple sample scenario. Instead, we are restricted to permuting arguments within each sample.

Since we have assumed that the $s$ samples are independent, permuting one sample’s, or block’s, arguments should not impact the other blocks. Hence, our new block-based definition of a symmetric kernel.

Since $\phi$ is symmetric within each block of arguments, we can require

$\begin{align*} 1 \leq i_1 < .&.. < i_{a_1} \leq n_1 \\ &\vdots \\ 1 \leq r_1 < .&.. < r_{a_s} \leq n_s. \end{align*}$

Then, from each of the $s$ independent samples, we require $a_i$ of the possible $n_i$ arguments, so that there are a total of

$\prod_{i=1}^{s} {n_i \choose a_i}$

possible combinations of the $a = a_1 + … + a_s$ arguments.

The U-statistic for $s$ independent samples is then defined analogously to that of a single sample,

$U = \frac{1}{{n_1 \choose a_1} … {n_s \choose a_s}} \mathop{\sum … \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a_1} \leq n_1 \\ \vdots \\ 1 \leq r_1 < ... < r_{a_s} \leq n_s}} \phi(X_{1{i_1}}, ..., X_{1i_{a_1}}; ... ; X_{s{r_1}}, ... X_{s{r_{a_s}}})$

so that $U$ is the average of $\phi$ evaluated at all $\prod_{i=1}^{s} {n_i \choose a_i}$ independent combinations of the $s$ blocks’ arguments.

This notation got real ugly real fast but I promise that we will pretend that the two-sample case is the only important case soon enough.

Asymptotic distribution

As you can imagine, the derivation of the asymptotic distribution of $U$ for multiple samples is a notational nightmare. As a result, I will provide a sketch of the required elements. The remainder of the proof follows the logic of that of a single sample.

The variance of $U$ can be expressed in terms of the covariance of its summands. We start, once again, by focusing on a single covariance term. To “simplify notation”, we consider

$\text{Cov}[\phi(X_{11}, …, X_{1{a_1}};…;X_{s1}, …, X_{s{a_s}}), \phi(X'_{11}, …, X'_{1{a_1}};…;X'_{s1}, …, X'_{s{a_s}})].$

Let $0 \leq j_i \leq a_i ~,~ i = 1, …, s$ represent the number of elements common to the $i^{th}$ block of the two terms such that,

$\begin{align*} \text{Cov}[&\phi(X_{11}, …, X_{1{j_1}}, X_{1{(j_1 + 1)}}, …, X_{1{a_1}};…;_X{s1}, …, X_{s{j_s}}, X_{s{(j_s + 1)}}, …, X_{s{j_s}}), \\ &\phi(X_{11}, …, X_{1{j_1}}, X'_{1{(j_1 + 1)}}, …, X'_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X'_{s{(j_s + 1)}}, …, X_'{s{j_s}})]. \end{align*}$

Conditioning on all $j_1, j_2, …, j_s$ common elements will make the two terms conditionally independent. Thus, we can define the multiple sample analogue to the single sample $\phi_c$ . For $0 \leq j_i \leq a_i ~,~ i = 1, …, s$ ,

$\begin{align*} \phi_{j_1, …, j_s} &(X_{11}, … X_{1j_1}; …; X_{s1}, …,;X_{sj_s}) \\ &= \mathbb{E}_F \Big[ \phi(X_{11}, …, X_{1a_1}; …; X_{s1}, …, X_{sa_s}) | X_{11}, … X_{1j_1}; …; X_{s1}, …, X_{sj_s}\Big] \end{align*}$

and its variance as,

$\begin{align*} \sigma^{2}_{j_1, …, j_s} = \text{Var}_F~ &\phi_{j_1, …, j_s} (X_{11}, … X_{1j_1}; …; X_{s1}, …, X_{sj_s}) \\ =\text{Cov}[&\phi(X_{11}, …, X_{1{j_1}}, X_{1{(j_1 + 1)}}, …, X_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X_{s{(j_s + 1)}}, …, X_{s{j_s}}), \\ &\phi(X_{11}, …, X_{1{j_1}}, X'_{1{(j_1 + 1)}}, …, X'_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X'_{s{(j_s + 1)}}, …, X'_{s{j_s}})]. \end{align*}$

Expressing $\text{Var}_{F}~U$ in terms of $\sigma^2_{j_1, …, j_s}$ , it can be shown that

$\sigma^2 = \text{Var}_F~\sqrt{N}U \sim \frac{a_1^2}{\rho_1} \sigma^2_{100…0} + \frac{a_2^2}{\rho_2} \sigma^2_{010…0} + … + \frac{a_s^2}{\rho_s} \sigma^2_{000…1}$

when all the variance components are finite and

$\frac{n_i}{N} \rightarrow \rho_i ~,~ 0 < \rho_i < 1.$

Thus, the variance of $U$ is essentially the sum of the variance components obtained by conditioning on a single element within each of the $s$ samples, or in each of the $s$ blocks.

Finally, it can be shown

$\sqrt{N}(U - \theta) \rightarrow N(0, \sigma^2).$

Two-sample scenario

Consider two independent samples denoted $X_1, …, X_m \stackrel{i.i.d}{\sim} F$ and $Y_1, …, Y_n \stackrel{i.i.d}{\sim} G$ . The two-sample U-statistic for $0 \leq a \leq m$ and $0 \leq b \leq n$ is,

$U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}).$

If $P = (F, G)$ , conditioning on $i$ elements of $X$ and $j$ elements of $Y$ ,

$\begin{align*} \phi_{ij}(X_1, …, X_i&; Y_1, …, Y_j) = \\ & \mathbb{E}_P \Big[\phi(X_1, …, X_a; Y_1, …, Y_b) | X_1, …, X_i; Y_1, …, Y_j \Big] \end{align*}$

and

$\begin{align*} \sigma^{2}_{ij} = \text{Var}_{P}~& \phi_{ij} (X_1, …, X_i; Y_1, …, Y_j) \\ = \text{Cov} [&\phi(X_1, …, X_i, X_{i+1}, …, X_a; Y_1, …, Y_j, Y_{j+1}, …, Y_b), \\ &\phi(X_1, …, X_i, X'_{i+1}, …, X'_a; Y_1, …, Y_j, Y'_{j+1}, …, Y'_b)]. \end{align*}$

Then for $i=1$ and $j=0$ , we obtain

$\begin{align*} \sigma^{2}_{10} = \text{Cov} [&\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \\ &\phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ] \end{align*}$

and similarly for $i=0$ and $j=1$ ,

$\begin{align*} \sigma^{2}_{01} = \text{Cov} [&\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \\ &\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ]. \end{align*}$

If $\sigma^{2}_{01}$ and $\sigma^{2}_{10}$ are finite and

$\frac{m}{N} \rightarrow \rho \>,\> \frac{n}{N} \rightarrow 1-\rho \>,\> 0 < \rho < 1$

then we can approximate the limiting variance of $\sqrt{N}U$ as

$\sigma^2 = \frac{a^2}{\rho} \sigma^2_{10} + \frac{b^2}{1-\rho} \sigma^2_{01}$

such that

$\sqrt{N}(U - \theta) \rightarrow N\left(0, \frac{a^2}{\rho} \sigma^{2}_{10} + \frac{b^2}{1-\rho} \sigma^{2}_{01}\right).$

At this point, you may be wondering what the heck are these variance components and how am I supposed to estimate them. In my next blog post, I promise that I will provide some examples of common one and two-sample U-statistics and their limiting distributions using our results!

For examples of common one- and two-sample U-statistics and their limiting distribution, check out One, Two, U: Examples of common one- and two-sample U-statistics.

Click here to download this blog post as an RMarkdown (.Rmd) file!

Review

Definitions for multiple independent samples

Asymptotic distribution

Two-sample scenario

Published by

Emma Davies Smith

3 thoughts on “Much Two U About Nothing: Extension of U-statistics to multiple independent samples”

Leave a Reply Cancel reply