One, Two, U: Examples of common one- and two-sample U-statistics

My previous two blog posts revolved around derivation of the limiting distribution of U-statistics for one sample and multiple independent samples.

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

The notation within these derivations can get quite complicated and it may be a bit unclear as to how to actually derive components of the limiting distribution.

In this blog post, I provide two examples of both common one-sample U-statistics (Variance, Kendall’s Tau) and two-sample U-statistics (Difference of two means, Wilcoxon Mann-Whitney rank-sum statistic) and derive their limiting distribution using our previously developed theory.

Asymptotic distribution of U-statistics

One sample

For a single sample, X_1, …, X_n \stackrel{i.i.d}{\sim} F, the U-statistic is given by

    \[U_n = \frac{1}{{n \choose a}} \sum_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

where \phi is a symmetric kernel of degree a.

For a review of what it means for \phi to be symmetric, check out U-, V-, and Dupree Statistics.

In the examples covered by this blog post, a = 2, so we can re-write U_n as,

    \[U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \phi(X_{i}, X_{j}).\]

Alternatively, this is equivalent to,

    \[U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \phi(X_{i}, X_{j}).\]

The limiting variance of U_n is given by,

    \[ Var ~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma_{c}^2 \]

where

    \[ \sigma_{c}^2 = \text{Var}_F \Big[ \mathbb{E}_{F}~\Big( \phi(X_1, …, X_a) | X_1, …, X_c \Big)\Big] = \text{Var}_{F}~ \phi_c(X_1, …, X_c) \]

or equivalently,

    \[ \sigma_{c}^2 = \text{Cov} \Big[ \phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)\Big] .\]

Note that when a=c, \phi_c(X_1, …, X_c) = \phi(x_1, …, x_a).

For a=2, these expressions reduce to

    \[\text{Var}_F~U_n = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right] \]

where

    \[ \sigma_{1}^{2} = \text{Var}_{F}~\phi_1(X_1) = \text{Var}_{F} \Big[ \mathbb{E}_{F} \Big( \phi(X_1, X_2) | X_1 \Big)\Big] \]

and

    \[ \sigma_{2}^{2} = \text{Var}_{F}~\phi(X_1, X_2). \]

The limiting distribution of U_n for a=2 is then,

    \[\sqrt{n}(U_n - \theta) \rightarrow N\left(0, \frac{2}{n-1} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]\right).\]

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

Two independent samples

For two independent samples denoted X_1, …, X_m \stackrel{i.i.d}{\sim} F and Y_1, …, Y_n \stackrel{i.i.d}{\sim} G, the two-sample U-statistic is given by

    \[ U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}). \]

where \phi is a kernel that is independently symmetric within the two blocks (X_{i_1}, ..., X_{i_a}) and (Y_{j_1}, ..., Y_{j_b}).

In the examples covered by this blog post, a = b = 1, reducing the U-statistic to,

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) .\]

The limiting variance of U is given by,

    \[ \text{Var}~U = \frac{a^2}{m} \sigma_{10}^2 + \frac{b^2}{n} \sigma_{01}^{2} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b),\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ].\]

Equivalently,

    \[ \sigma^{2}_{10} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …., Y_b) | X_1 \Big) \Big] = \text{Var}~\phi_{10}(X_1)\]

and

    \[ \sigma^{2}_{01} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …, Y_b) | Y_1 \Big) \Big] = \text{Var}~\phi_{01}(Y_1).\]

For a=b=1, these expressions reduce to

    \[ \text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{ \sigma_{01}^{2}}{n} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1; Y_1), \phi(X_1; Y'_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | X_1 \Big) \Big] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1; Y_1), \phi(X'_1; Y_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | Y_1 \Big) \Big] .\]

The limiting distribution of U_n for a=b=1 and N=n+m is then,

    \[\sqrt{N}(U_n - \theta) \rightarrow N\left(0, \frac{N}{m} \sigma_{10}^2 + \frac{N}{n} \sigma_{01}^{2} \right).\]

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

Continue reading One, Two, U: Examples of common one- and two-sample U-statistics

Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Thank you very much to the lovely Feben Alemu for pointing me in the direction of https://pungenerator.org/ as a means of ensuring we never have to go without a brilliant title! With great power comes great responsibility.

Season 2 Crying GIF by Pose FX

Review

Statistical functionals are any real-valued function of a distribution function F, \theta = T(F). When F is unknown, nonparametric estimation only requires that F belong to a broad class of distribution functions \mathcal{F}, typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size n, X_1, …, X_n \stackrel{i.i.d}{\sim} F, a statistical functional \theta = T(F) is said to belong to the family of expectation functionals if:

  1. T(F) takes the form of an expectation of a function \phi with respect to F,

        \[T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a) \]

  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a \leq n.

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree a = 2, \phi is symmetric if \phi(x_1, x_2) = \phi(x_2, x_1).

If \theta = T(F) is an expecation functional and the class of distribution functions \mathcal{F} is broad enough, an unbiased estimator of \theta = T(F) can always be constructed. This estimator is known as a U-statistic and takes the form,

    \[ U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

such that U_n is the average of \phi evaluated at all {n \choose a} distinct combinations of size a from X_1, …, X_n.

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

Since each X_i appears in more than one summand of U_n, the central limit theorem cannot be used to derive the limiting distribution of U_n as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that U_n is in fact asymptotically normal with mean

    \[\mathbb{E}_F~ U_n = \theta = T(F)\]

and variance

    \[\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}\]

where

    \[\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].\]

The sketch of the proof is as follows:

  1. Express the variance of U_n in terms of the covariance of its summands,

    \[\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].\]

  1. Recognize that if two terms share c common elements such that,

        \[ \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    conditioning on their c shared elements will make the two terms independent.

  2. For 0 \leq c \leq n, define

        \[\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big] \]

    such that

        \[\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)\]

    and

        \[\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).\]

    Note that when c = 0, \phi_0 = \theta and \sigma_0^2 = 0, and when c=a, \phi_a = \phi(X_1, …, X_a) and \sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a).

  3. Use the law of iterated expecation to demonstrate that

        \[ \sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    and re-express \text{Var}_{F}~U_n as the sum of the \sigma_{c}^2,

        \[ \text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.\]

    Recognizing that the first variance term dominates for large n, approximate \text{Var}_F~ U_n as

        \[\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.\]

  4. Identify a surrogate U^{*}_n that has the same mean and variance as U_n-\theta but is the sum of independent terms,

        \[ U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i] \]

    so that the central limit may be used to show

        \[ \sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).\]

  5. Demonstrate that U_n - \theta and U_n^{*} converge in probability,

        \[ \sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0 \]

    and thus have the same limiting distribution so that

        \[\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).\]

For a walkthrough derivation of the limiting distribution of U_n for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

Continue reading Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Getting to know U: the asymptotic distribution of a single U-statistic

After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.

Overview

A statistical functional is any real-valued function of a distribution function F such that

    \[ \theta = T(F) \]

and represents characteristics of the distribution F and include the mean, variance, and quantiles.

Often times F is unknown but is assumed to belong to a broad class of distribution functions \mathcal{F} subject only to mild restrictions such as continuity or existence of specific moments.

A random sample X_1, …, X_n \stackrel{i.i.d}{\sim} F can be used to construct the empirical cumulative distribution function (ECDF) \hat{F}_n,

    \[ \hat{F}_{n}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x) \]

which assigns mass \frac{1}{n} to each X_i.

\hat{F}_{n} is a valid, discrete CDF which can be substituted for F to obtain \hat{\theta} = T(\hat{F}_n). These estimators are referred to as plug-in estimators for obvious reasons.

For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!

Many statistical functionals take the form of an expectation of a real-valued function \phi with respect to F such that for a \leq n,

    \[ \theta = T(F) = \mathbb{E}_{F}~ \phi(X_1, …, X_a) .\]

When \phi(x_1, …, x_a) is a function symmetric in its arguments such that, for e.g. \phi(x_1, x_2) = \phi(x_2, x_1), it is referred to as a symmetric kernel of degree a. If \phi is not symmetric, a symmetric equivalent \phi^{*} can always be found,

    \[\phi^{*}(x_1, …, x_a) = \frac{1}{a!} \sum_{\pi ~\in~ \Pi} \phi(x_{\pi(1)}, …, x_{\pi(a)})\]

where \Pi represents the set of all permutations of the indices 1, …, a.

A statistical functional \theta = T(F) belongs to a special family of expectation functionals when:

  1. T(F) = \mathbb{E}_F ~\phi(X_1, …, X_a), and
  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a.

Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,

    \[V_n = \frac{1}{n^a} \sum_{i_1 = 1}^{n} … \sum_{i_a = 1}^{n} \phi(X_{i_1}, …, X_{i_a}) \]

so that V_n is the average of \phi evaluated at all possible permutations of size a from X_1, …, X_n. Since the X_i can appear more than once within each summand, V_n is generally biased.

By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions \mathcal{F} is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.

Since \phi is symmetric, we can require that 1 \leq i_1 < ... < i_a \leq n, resulting in {n \choose a} combinations of the subscripts 1, ..., a. The U-statistic is then the average of \phi evaluated at all {n \choose a} distinct combinations of X_1, ..., X_n,

    \[U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a}).\]

While i_j \neq i_k within each summand now, each X_i still appears in multiple summands, suggesting that U_n is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of U_n.

For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!

This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic U_n.

Continue reading Getting to know U: the asymptotic distribution of a single U-statistic