One, Two, U: Examples of common one- and two-sample U-statistics

My previous two blog posts revolved around derivation of the limiting distribution of U-statistics for one sample and multiple independent samples.

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

The notation within these derivations can get quite complicated and it may be a bit unclear as to how to actually derive components of the limiting distribution.

In this blog post, I provide two examples of both common one-sample U-statistics (Variance, Kendall’s Tau) and two-sample U-statistics (Difference of two means, Wilcoxon Mann-Whitney rank-sum statistic) and derive their limiting distribution using our previously developed theory.

Asymptotic distribution of U-statistics

One sample

For a single sample, X_1, …, X_n \stackrel{i.i.d}{\sim} F, the U-statistic is given by

    \[U_n = \frac{1}{{n \choose a}} \sum_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

where \phi is a symmetric kernel of degree a.

For a review of what it means for \phi to be symmetric, check out U-, V-, and Dupree Statistics.

In the examples covered by this blog post, a = 2, so we can re-write U_n as,

    \[U_n = \frac{1}{{n \choose 2}} \sum_{1 \leq i < j \leq n} \phi(X_{i}, X_{j}).\]

Alternatively, this is equivalent to,

    \[U_n = \frac{1}{n(n-1)} \sum_{i \neq j} \phi(X_{i}, X_{j}).\]

The limiting variance of U_n is given by,

    \[ Var ~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma_{c}^2 \]

where

    \[ \sigma_{c}^2 = \text{Var}_F \Big[ \mathbb{E}_{F}~\Big( \phi(X_1, …, X_a) | X_1, …, X_c \Big)\Big] = \text{Var}_{F}~ \phi_c(X_1, …, X_c) \]

or equivalently,

    \[ \sigma_{c}^2 = \text{Cov} \Big[ \phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)\Big] .\]

Note that when a=c, \phi_c(X_1, …, X_c) = \phi(x_1, …, x_a).

For a=2, these expressions reduce to

    \[\text{Var}_F~U_n = \frac{2}{n(n-1)} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right] \]

where

    \[ \sigma_{1}^{2} = \text{Var}_{F}~\phi_1(X_1) = \text{Var}_{F} \Big[ \mathbb{E}_{F} \Big( \phi(X_1, X_2) | X_1 \Big)\Big] \]

and

    \[ \sigma_{2}^{2} = \text{Var}_{F}~\phi(X_1, X_2). \]

The limiting distribution of U_n for a=2 is then,

    \[\sqrt{n}(U_n - \theta) \rightarrow N\left(0, \frac{2}{n-1} \left[2(n-2) \sigma_1^{2} + \sigma_2^{2} \right]\right).\]

For derivation of the limiting distribution of a U-statistic for a single sample, check out Getting to know U: the asymptotic distribution of a single U-statistic.

Two independent samples

For two independent samples denoted X_1, …, X_m \stackrel{i.i.d}{\sim} F and Y_1, …, Y_n \stackrel{i.i.d}{\sim} G, the two-sample U-statistic is given by

    \[ U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}). \]

where \phi is a kernel that is independently symmetric within the two blocks (X_{i_1}, ..., X_{i_a}) and (Y_{j_1}, ..., Y_{j_b}).

In the examples covered by this blog post, a = b = 1, reducing the U-statistic to,

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) .\]

The limiting variance of U is given by,

    \[ \text{Var}~U = \frac{a^2}{m} \sigma_{10}^2 + \frac{b^2}{n} \sigma_{01}^{2} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b), \phi(X_1, X'_2, …, X'_a; Y'_1, Y'_2…, Y'_b) ] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1, X_2, …, X_a; Y_1, Y_2, …, Y_b),\phi(X'_1, X'_2, …, X'_a; Y_1, Y'_2…, Y'_b) ].\]

Equivalently,

    \[ \sigma^{2}_{10} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …., Y_b) | X_1 \Big) \Big] = \text{Var}~\phi_{10}(X_1)\]

and

    \[ \sigma^{2}_{01} = \text{Var} \Big[ \mathbb{E} \Big(\phi(X_1, …, X_a; Y_1, …, Y_b) | Y_1 \Big) \Big] = \text{Var}~\phi_{01}(Y_1).\]

For a=b=1, these expressions reduce to

    \[ \text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{ \sigma_{01}^{2}}{n} \]

where

    \[ \sigma^{2}_{10} = \text{Cov} [\phi(X_1; Y_1), \phi(X_1; Y'_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | X_1 \Big) \Big] \]

and

    \[ \sigma^{2}_{01} = \text{Cov} [\phi(X_1; Y_1), \phi(X'_1; Y_1) ] =\text{Var} \Big[ \mathbb{E} \Big(\phi(X_1; Y_1) | Y_1 \Big) \Big] .\]

The limiting distribution of U_n for a=b=1 and N=n+m is then,

    \[\sqrt{N}(U_n - \theta) \rightarrow N\left(0, \frac{N}{m} \sigma_{10}^2 + \frac{N}{n} \sigma_{01}^{2} \right).\]

For derivation of the limiting distribution of a U-statistic for multiple independent samples, check out Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

Continue reading One, Two, U: Examples of common one- and two-sample U-statistics

Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Thank you very much to the lovely Feben Alemu for pointing me in the direction of https://pungenerator.org/ as a means of ensuring we never have to go without a brilliant title! With great power comes great responsibility.

Season 2 Crying GIF by Pose FX

Review

Statistical functionals are any real-valued function of a distribution function F, \theta = T(F). When F is unknown, nonparametric estimation only requires that F belong to a broad class of distribution functions \mathcal{F}, typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size n, X_1, …, X_n \stackrel{i.i.d}{\sim} F, a statistical functional \theta = T(F) is said to belong to the family of expectation functionals if:

  1. T(F) takes the form of an expectation of a function \phi with respect to F,

        \[T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a) \]

  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a \leq n.

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree a = 2, \phi is symmetric if \phi(x_1, x_2) = \phi(x_2, x_1).

If \theta = T(F) is an expecation functional and the class of distribution functions \mathcal{F} is broad enough, an unbiased estimator of \theta = T(F) can always be constructed. This estimator is known as a U-statistic and takes the form,

    \[ U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

such that U_n is the average of \phi evaluated at all {n \choose a} distinct combinations of size a from X_1, …, X_n.

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

Since each X_i appears in more than one summand of U_n, the central limit theorem cannot be used to derive the limiting distribution of U_n as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that U_n is in fact asymptotically normal with mean

    \[\mathbb{E}_F~ U_n = \theta = T(F)\]

and variance

    \[\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}\]

where

    \[\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].\]

The sketch of the proof is as follows:

  1. Express the variance of U_n in terms of the covariance of its summands,

    \[\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].\]

  1. Recognize that if two terms share c common elements such that,

        \[ \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    conditioning on their c shared elements will make the two terms independent.

  2. For 0 \leq c \leq n, define

        \[\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big] \]

    such that

        \[\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)\]

    and

        \[\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).\]

    Note that when c = 0, \phi_0 = \theta and \sigma_0^2 = 0, and when c=a, \phi_a = \phi(X_1, …, X_a) and \sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a).

  3. Use the law of iterated expecation to demonstrate that

        \[ \sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    and re-express \text{Var}_{F}~U_n as the sum of the \sigma_{c}^2,

        \[ \text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.\]

    Recognizing that the first variance term dominates for large n, approximate \text{Var}_F~ U_n as

        \[\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.\]

  4. Identify a surrogate U^{*}_n that has the same mean and variance as U_n-\theta but is the sum of independent terms,

        \[ U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i] \]

    so that the central limit may be used to show

        \[ \sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).\]

  5. Demonstrate that U_n - \theta and U_n^{*} converge in probability,

        \[ \sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0 \]

    and thus have the same limiting distribution so that

        \[\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).\]

For a walkthrough derivation of the limiting distribution of U_n for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

Continue reading Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Getting to know U: the asymptotic distribution of a single U-statistic

After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.

Overview

A statistical functional is any real-valued function of a distribution function F such that

    \[ \theta = T(F) \]

and represents characteristics of the distribution F and include the mean, variance, and quantiles.

Often times F is unknown but is assumed to belong to a broad class of distribution functions \mathcal{F} subject only to mild restrictions such as continuity or existence of specific moments.

A random sample X_1, …, X_n \stackrel{i.i.d}{\sim} F can be used to construct the empirical cumulative distribution function (ECDF) \hat{F}_n,

    \[ \hat{F}_{n}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x) \]

which assigns mass \frac{1}{n} to each X_i.

\hat{F}_{n} is a valid, discrete CDF which can be substituted for F to obtain \hat{\theta} = T(\hat{F}_n). These estimators are referred to as plug-in estimators for obvious reasons.

For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!

Many statistical functionals take the form of an expectation of a real-valued function \phi with respect to F such that for a \leq n,

    \[ \theta = T(F) = \mathbb{E}_{F}~ \phi(X_1, …, X_a) .\]

When \phi(x_1, …, x_a) is a function symmetric in its arguments such that, for e.g. \phi(x_1, x_2) = \phi(x_2, x_1), it is referred to as a symmetric kernel of degree a. If \phi is not symmetric, a symmetric equivalent \phi^{*} can always be found,

    \[\phi^{*}(x_1, …, x_a) = \frac{1}{a!} \sum_{\pi ~\in~ \Pi} \phi(x_{\pi(1)}, …, x_{\pi(a)})\]

where \Pi represents the set of all permutations of the indices 1, …, a.

A statistical functional \theta = T(F) belongs to a special family of expectation functionals when:

  1. T(F) = \mathbb{E}_F ~\phi(X_1, …, X_a), and
  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a.

Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,

    \[V_n = \frac{1}{n^a} \sum_{i_1 = 1}^{n} … \sum_{i_a = 1}^{n} \phi(X_{i_1}, …, X_{i_a}) \]

so that V_n is the average of \phi evaluated at all possible permutations of size a from X_1, …, X_n. Since the X_i can appear more than once within each summand, V_n is generally biased.

By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions \mathcal{F} is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.

Since \phi is symmetric, we can require that 1 \leq i_1 < ... < i_a \leq n, resulting in {n \choose a} combinations of the subscripts 1, ..., a. The U-statistic is then the average of \phi evaluated at all {n \choose a} distinct combinations of X_1, ..., X_n,

    \[U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a}).\]

While i_j \neq i_k within each summand now, each X_i still appears in multiple summands, suggesting that U_n is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of U_n.

For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!

This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic U_n.

Continue reading Getting to know U: the asymptotic distribution of a single U-statistic

Parametric vs. Nonparametric Approach to Estimations

Parametric statistics assume that the unknown CDF F belongs to a family of CDFs characterized by a parameter (vector) \theta. As the form of F is assumed, the target of estimation is its parameters \theta. Thus, all uncertainty about F is comprised of uncertainty about its parameters. Parameters are estimated by \hat{\theta}, and estimates are be substituted into the assumed distribution to conduct inference for the quantities of interest. If the assumed distribution F is incorrect, inference may also be inaccurate, or trends in the data may be missed.

To demonstrate the parametric approach, consider n = 100 independent and identically distributed random variables X_1, …, X_n generated from an exponential distribution with rate \lambda = 2. Investigators wish to estimate the 75^{th} percentile and erroneously assume that their data is normally distributed. Thus, F is assumed to be the Normal CDF but \mu and \sigma^2 are unknown. The parameters \mu and \sigma are estimated in their typical way by \bar{x} and \sigma^2, respectively. Since the normal distribution belongs to the location-scale family, an estimate of the p^{th} percentile is provided by,

    \[x_p = \bar{x} + s\Phi^{-1}(p)\]

where \Phi^{-1} is the standard normal quantile function, also known as the probit.

set.seed(12345)
library(tidyverse, quietly = T)
# Generate data from Exp(2)
x <- rexp(n = 100, rate = 2)

# True value of 75th percentile with rate = 2
true <- qexp(p = 0.75, rate = 2) 
true
## [1] 0.6931472
# Estimate mu and sigma
xbar <- mean(x)
s    <- sd(x)

# Estimate 75th percentile assuming mu = xbar and sigma = s
param_est <- xbar + s * qnorm(p = 0.75)
param_est
## [1] 0.8792925

The true value of the 75^{th} percentile of \text{Exp}(2) is 0.69 while the parametric estimate is 0.88.

Nonparametric statistics make fewer distributions about the unknown distribution F, requiring only mild assumptions such as continuity or the existence of specific moments. Instead of estimating parameters of F, F itself is the target of estimation. F is commonly estimated by the empirical cumulative distribution function (ECDF) \hat{F},

    \[\hat{F}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x).\]

Any statistic that can be expressed as a function of the CDF, known as a statistical functional and denoted \theta = T(F), can be estimated by substituting \hat{F} for F. That is, plug-in estimators can be obtained as \hat{\theta} = T(\hat{F}).

Continue reading Parametric vs. Nonparametric Approach to Estimations