Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Thank you very much to the lovely Feben Alemu for pointing me in the direction of https://pungenerator.org/ as a means of ensuring we never have to go without a brilliant title! With great power comes great responsibility.

Season 2 Crying GIF by Pose FX

Review

Statistical functionals are any real-valued function of a distribution function F, \theta = T(F). When F is unknown, nonparametric estimation only requires that F belong to a broad class of distribution functions \mathcal{F}, typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size n, X_1, …, X_n \stackrel{i.i.d}{\sim} F, a statistical functional \theta = T(F) is said to belong to the family of expectation functionals if:

  1. T(F) takes the form of an expectation of a function \phi with respect to F,

        \[T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a) \]

  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a \leq n.

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree a = 2, \phi is symmetric if \phi(x_1, x_2) = \phi(x_2, x_1).

If \theta = T(F) is an expecation functional and the class of distribution functions \mathcal{F} is broad enough, an unbiased estimator of \theta = T(F) can always be constructed. This estimator is known as a U-statistic and takes the form,

    \[ U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

such that U_n is the average of \phi evaluated at all {n \choose a} distinct combinations of size a from X_1, …, X_n.

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

Since each X_i appears in more than one summand of U_n, the central limit theorem cannot be used to derive the limiting distribution of U_n as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that U_n is in fact asymptotically normal with mean

    \[\mathbb{E}_F~ U_n = \theta = T(F)\]

and variance

    \[\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}\]

where

    \[\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].\]

The sketch of the proof is as follows:

  1. Express the variance of U_n in terms of the covariance of its summands,

    \[\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].\]

  1. Recognize that if two terms share c common elements such that,

        \[ \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    conditioning on their c shared elements will make the two terms independent.

  2. For 0 \leq c \leq n, define

        \[\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big] \]

    such that

        \[\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)\]

    and

        \[\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).\]

    Note that when c = 0, \phi_0 = \theta and \sigma_0^2 = 0, and when c=a, \phi_a = \phi(X_1, …, X_a) and \sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a).

  3. Use the law of iterated expecation to demonstrate that

        \[ \sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    and re-express \text{Var}_{F}~U_n as the sum of the \sigma_{c}^2,

        \[ \text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.\]

    Recognizing that the first variance term dominates for large n, approximate \text{Var}_F~ U_n as

        \[\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.\]

  4. Identify a surrogate U^{*}_n that has the same mean and variance as U_n-\theta but is the sum of independent terms,

        \[ U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i] \]

    so that the central limit may be used to show

        \[ \sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).\]

  5. Demonstrate that U_n - \theta and U_n^{*} converge in probability,

        \[ \sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0 \]

    and thus have the same limiting distribution so that

        \[\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).\]

For a walkthrough derivation of the limiting distribution of U_n for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

Definitions for multiple independent samples

Consider s independent random samples denoted

    \[X_{i1}, …, X_{i{n_i}} \stackrel{i.i.d}{\sim} F_i ~,~ i = 1, …, s\]

where n_i is the size of the i^{th} sample and N = n_1 + … + n_s is the total sample size. Let F = (F_1, …, F_s). Then, for multiple independent samples, a statistical functional \theta = T(F) is an expectation functional if

    \[ \theta = T(F) = \mathbb{E}_F~ \phi(X_{11}, …, X_{1{a_1}}; … ; X_{s_1}, …, X_{s{a_s}}) \]

where the kernel \phi is a function symmetric within each of the s blocks of arguments \{(X_{11}, …, X_{1{a_1}}), …, (X_{s1}, …, X_{s{a_s}})\}. That is, each block of arguments within \phi can be permuted independently without changing the value of \phi. By this definition, \phi is an unbiased estimator of \theta.

In the single sample case, we were able to permute all a arguments of \phi(X_1, …, X_a) without impacting \phi‘s value as each X_i was identically distributed and thus “exchangeable”. Here, all a arguments are not identically distributed.

The first sample may be distributed according to F_1, the second according to F_2, and so on. We have not required that F_1 = F_2 = … = F_s and so we cannot assume all a arguments are exchangeable in the multiple sample scenario. Instead, we are restricted to permuting arguments within each sample.

Since we have assumed that the s samples are independent, permuting one sample’s, or block’s, arguments should not impact the other blocks. Hence, our new block-based definition of a symmetric kernel.

Since \phi is symmetric within each block of arguments, we can require

    \begin{align*} 1 \leq i_1 < .&.. < i_{a_1} \leq n_1 \\ &\vdots \\ 1 \leq r_1 < .&.. < r_{a_s} \leq n_s. \end{align*}

Then, from each of the s independent samples, we require a_i of the possible n_i arguments, so that there are a total of

    \[\prod_{i=1}^{s} {n_i \choose a_i}\]

possible combinations of the a = a_1 + … + a_s arguments.

The U-statistic for s independent samples is then defined analogously to that of a single sample,

    \[U = \frac{1}{{n_1 \choose a_1} … {n_s \choose a_s}} \mathop{\sum … \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a_1} \leq n_1 \\ \vdots \\ 1 \leq r_1 < ... < r_{a_s} \leq n_s}} \phi(X_{1{i_1}}, ..., X_{1i_{a_1}}; ... ; X_{s{r_1}}, ... X_{s{r_{a_s}}})\]

so that U is the average of \phi evaluated at all \prod_{i=1}^{s} {n_i \choose a_i} independent combinations of the s blocks’ arguments.

This notation got real ugly real fast but I promise that we will pretend that the two-sample case is the only important case soon enough.

Asymptotic distribution

As you can imagine, the derivation of the asymptotic distribution of U for multiple samples is a notational nightmare. As a result, I will provide a sketch of the required elements. The remainder of the proof follows the logic of that of a single sample.

The variance of U can be expressed in terms of the covariance of its summands. We start, once again, by focusing on a single covariance term. To “simplify notation”, we consider

    \[\text{Cov}[\phi(X_{11}, …, X_{1{a_1}};…;X_{s1}, …, X_{s{a_s}}), \phi(X'_{11}, …, X'_{1{a_1}};…;X'_{s1}, …, X'_{s{a_s}})].\]

Let 0 \leq j_i \leq a_i ~,~ i = 1, …, s represent the number of elements common to the i^{th} block of the two terms such that,

    \begin{align*} \text{Cov}[&\phi(X_{11}, …, X_{1{j_1}}, X_{1{(j_1 + 1)}}, …, X_{1{a_1}};…;_X{s1}, …, X_{s{j_s}}, X_{s{(j_s + 1)}}, …, X_{s{j_s}}), \\ &\phi(X_{11}, …, X_{1{j_1}}, X'_{1{(j_1 + 1)}}, …, X'_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X'_{s{(j_s + 1)}}, …, X_'{s{j_s}})]. \end{align*}

Conditioning on all j_1, j_2, …, j_s common elements will make the two terms conditionally independent. Thus, we can define the multiple sample analogue to the single sample \phi_c. For 0 \leq j_i \leq a_i ~,~ i = 1, …, s,

    \begin{align*} \phi_{j_1, …, j_s} &(X_{11}, … X_{1j_1}; …; X_{s1}, …,;X_{sj_s}) \\ &= \mathbb{E}_F \Big[ \phi(X_{11}, …, X_{1a_1}; …; X_{s1}, …, X_{sa_s}) | X_{11}, … X_{1j_1}; …; X_{s1}, …, X_{sj_s}\Big] \end{align*}

and its variance as,

    \begin{align*} \sigma^{2}_{j_1, …, j_s} = \text{Var}_F~ &\phi_{j_1, …, j_s} (X_{11}, … X_{1j_1}; …; X_{s1}, …, X_{sj_s}) \\ =\text{Cov}[&\phi(X_{11}, …, X_{1{j_1}}, X_{1{(j_1 + 1)}}, …, X_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X_{s{(j_s + 1)}}, …, X_{s{j_s}}), \\ &\phi(X_{11}, …, X_{1{j_1}}, X'_{1{(j_1 + 1)}}, …, X'_{1{a_1}};…;X_{s1}, …, X_{s{j_s}}, X'_{s{(j_s + 1)}}, …, X'_{s{j_s}})]. \end{align*}

Expressing \text{Var}_{F}~U in terms of \sigma^2_{j_1, …, j_s}, it can be shown that

    \[\sigma^2 = \text{Var}_F~\sqrt{N}U \sim \frac{a_1^2}{\rho_1} \sigma^2_{100…0} + \frac{a_2^2}{\rho_2} \sigma^2_{010…0} + … + \frac{a_s^2}{\rho_s} \sigma^2_{000…1}\]

when all the variance components are finite and

    \[\frac{n_i}{N} \rightarrow \rho_i ~,~ 0 < \rho_i < 1.\]

Thus, the variance of U is essentially the sum of the variance components obtained by conditioning on a single element within each of the s samples, or in each of the s blocks.

Finally, it can be shown

    \[ \sqrt{N}(U - \theta) \rightarrow N(0, \sigma^2).\]

Two-sample scenario

Consider two independent samples denoted X_1, …, X_m \stackrel{i.i.d}{\sim} F and Y_1, …, Y_n \stackrel{i.i.d}{\sim} G. The two-sample U-statistic for 0 \leq a \leq m and 0 \leq b \leq n is,

    \[ U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}). \]

If P = (F, G), conditioning on i elements of X and j elements of Y,

    \begin{align*} \phi_{ij}(X_1, …, X_i&; Y_1, …, Y_j) = \\ & \mathbb{E}_P \Big[\phi(X_1, …, X_a; Y_1, …, Y_b) | X_1, …, X_i; Y_1, …, Y_j \Big] \end{align*}

and

    \begin{align*} \sigma^{2}_{ij} = \text{Var}_{P}~& \phi_{ij} (X_1, …, X_i; Y_1, …, Y_j) \\ = \text{Cov} [&\phi(X_1, …, X_i, X_{i+1}, …, X_a; Y_1, …, Y_j, Y_{j+1}, …, Y_b), \\ &\phi(X_1, …, X_i, X'_{i+1}, …, X'_a; Y_1, …, Y_j, Y'_{j+1}, …, Y'_b)]. \end{align*}

Then for i=1 and j=0, we obtain

    \begin{align*} \sigma^{2}_{10} = \text{Cov} [&\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \\ &\phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ] \end{align*}

and similarly for i=0 and j=1,

    \begin{align*} \sigma^{2}_{01} = \text{Cov} [&\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \\ &\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ]. \end{align*}

If \sigma^{2}_{01} and \sigma^{2}_{10} are finite and

    \[\frac{m}{N} \rightarrow \rho \>,\> \frac{n}{N} \rightarrow 1-\rho \>,\> 0 < \rho < 1\]

then we can approximate the limiting variance of \sqrt{N}U as

    \[ \sigma^2 = \frac{a^2}{\rho} \sigma^2_{10} + \frac{b^2}{1-\rho} \sigma^2_{01}\]

such that

    \[\sqrt{N}(U - \theta) \rightarrow N\left(0, \frac{a^2}{\rho} \sigma^{2}_{10} + \frac{b^2}{1-\rho} \sigma^{2}_{01}\right).\]

At this point, you may be wondering what the heck are these variance components and how am I supposed to estimate them. In my next blog post, I promise that I will provide some examples of common one and two-sample U-statistics and their limiting distributions using our results!

For examples of common one- and two-sample U-statistics and their limiting distribution, check out One, Two, U: Examples of common one- and two-sample U-statistics.


Click here to download this blog post as an RMarkdown (.Rmd) file!

Published by

Emma Smith

Emma Smith is a young statistician who's on a mission to convince the masses statistics is as awesome as she *knows* it is! When she's not working on expanding her knowledge of machine learning and mathematical statistics, she's busy petting cats and unsuccessfully convincing her boyfriend to let her adopt them, hiking, concocting indie and folk rock playlists, and kicking butt in roller derby.

One thought on “Much Two U About Nothing: Extension of U-statistics to multiple independent samples”

Leave a Reply

Your email address will not be published. Required fields are marked *