Nonparametric Statistics Archives • Page 2 of 3 • Statisticelle

Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Thank you very much to the lovely Feben Alemu for pointing me in the direction of https://pungenerator.org/ as a means of ensuring we never have to go without a brilliant title! With great power comes great responsibility.

Season 2 Crying GIF by Pose FX

Review

Statistical functionals are any real-valued function of a distribution function $F$ , $\theta = T(F)$ . When $F$ is unknown, nonparametric estimation only requires that $F$ belong to a broad class of distribution functions $\mathcal{F}$ , typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size $n$ , $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ , a statistical functional $\theta = T(F)$ is said to belong to the family of expectation functionals if:

$T(F)$ takes the form of an expectation of a function $\phi$ with respect to $F$ ,
$T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a)$
$\phi(X_1, …, X_a)$ is a symmetric kernel of degree $a \leq n$ .

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree $a = 2$ , $\phi$ is symmetric if $\phi(x_1, x_2) = \phi(x_2, x_1)$ .

If $\theta = T(F)$ is an expecation functional and the class of distribution functions $\mathcal{F}$ is broad enough, an unbiased estimator of $\theta = T(F)$ can always be constructed. This estimator is known as a U-statistic and takes the form,

$U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})$

such that $U_n$ is the average of $\phi$ evaluated at all ${n \choose a}$ distinct combinations of size $a$ from $X_1, …, X_n$ .

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

Since each $X_i$ appears in more than one summand of $U_n$ , the central limit theorem cannot be used to derive the limiting distribution of $U_n$ as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that $U_n$ is in fact asymptotically normal with mean

$\mathbb{E}_F~ U_n = \theta = T(F)$

and variance

$\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}$

where

$\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].$

The sketch of the proof is as follows:

Express the variance of $U_n$ in terms of the covariance of its summands,

$\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].$

Recognize that if two terms share $c$ common elements such that,
$\text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)]$

conditioning on their $c$ shared elements will make the two terms independent.
For $0 \leq c \leq n$ , define
$\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big]$

such that

$\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)$

and

$\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).$

Note that when $c = 0$ , $\phi_0 = \theta$ and $\sigma_0^2 = 0$ , and when $c=a$ , $\phi_a = \phi(X_1, …, X_a)$ and $\sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a)$ .
Use the law of iterated expecation to demonstrate that
$\sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)]$

and re-express $\text{Var}_{F}~U_n$ as the sum of the $\sigma_{c}^2$ ,

$\text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.$

Recognizing that the first variance term dominates for large $n$ , approximate $\text{Var}_F~ U_n$ as

$\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.$
Identify a surrogate $U^{*}_n$ that has the same mean and variance as $U_n-\theta$ but is the sum of independent terms,
$U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i]$

so that the central limit may be used to show

$\sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).$
Demonstrate that $U_n - \theta$ and $U_n^{*}$ converge in probability,
$\sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0$

and thus have the same limiting distribution so that

$\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).$

For a walkthrough derivation of the limiting distribution of $U_n$ for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

Continue reading Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Getting to know U: the asymptotic distribution of a single U-statistic

After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.

Overview

A statistical functional is any real-valued function of a distribution function $F$ such that

$\theta = T(F)$

and represents characteristics of the distribution $F$ and include the mean, variance, and quantiles.

Often times $F$ is unknown but is assumed to belong to a broad class of distribution functions $\mathcal{F}$ subject only to mild restrictions such as continuity or existence of specific moments.

A random sample $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ can be used to construct the empirical cumulative distribution function (ECDF) $\hat{F}_n$ ,

$\hat{F}_{n}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x)$

which assigns mass $\frac{1}{n}$ to each $X_i$ .

$\hat{F}_{n}$ is a valid, discrete CDF which can be substituted for $F$ to obtain $\hat{\theta} = T(\hat{F}_n)$ . These estimators are referred to as plug-in estimators for obvious reasons.

For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!

Many statistical functionals take the form of an expectation of a real-valued function $\phi$ with respect to $F$ such that for $a \leq n$ ,

$\theta = T(F) = \mathbb{E}_{F}~ \phi(X_1, …, X_a) .$

When $\phi(x_1, …, x_a)$ is a function symmetric in its arguments such that, for e.g. $\phi(x_1, x_2) = \phi(x_2, x_1)$ , it is referred to as a symmetric kernel of degree $a$ . If $\phi$ is not symmetric, a symmetric equivalent $\phi^{*}$ can always be found,

$\phi^{*}(x_1, …, x_a) = \frac{1}{a!} \sum_{\pi ~\in~ \Pi} \phi(x_{\pi(1)}, …, x_{\pi(a)})$

where $\Pi$ represents the set of all permutations of the indices $1, …, a$ .

A statistical functional $\theta = T(F)$ belongs to a special family of expectation functionals when:

$T(F) = \mathbb{E}_F ~\phi(X_1, …, X_a)$ , and
$\phi(X_1, …, X_a)$ is a symmetric kernel of degree $a$ .

Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,

$V_n = \frac{1}{n^a} \sum_{i_1 = 1}^{n} … \sum_{i_a = 1}^{n} \phi(X_{i_1}, …, X_{i_a})$

so that $V_n$ is the average of $\phi$ evaluated at all possible permutations of size $a$ from $X_1, …, X_n$ . Since the $X_i$ can appear more than once within each summand, $V_n$ is generally biased.

By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions $\mathcal{F}$ is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.

Since $\phi$ is symmetric, we can require that $1 \leq i_1 < ... < i_a \leq n$ , resulting in ${n \choose a}$ combinations of the subscripts $1, ..., a$ . The U-statistic is then the average of $\phi$ evaluated at all ${n \choose a}$ distinct combinations of $X_1, ..., X_n$ ,

$U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a}).$

While $i_j \neq i_k$ within each summand now, each $X_i$ still appears in multiple summands, suggesting that $U_n$ is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of $U_n$ .

For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!

This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic $U_n$ .

Continue reading Getting to know U: the asymptotic distribution of a single U-statistic

U-, V-, and Dupree statistics

To start, I apologize for this blog’s title but I couldn’t resist referencing to the Owen Wilson classic You, Me, and Dupree – wow! The other gold-plated candidate was U-statistics and You. Please, please, hold your applause.

My previous blog post defined statistical functionals as any real-valued function of an unknown CDF, $T(F)$ , and explained how plug-in estimators could be constructed by substituting the empirical cumulative distribution function (ECDF) $\hat{F}_{n}$ for the unknown CDF $F$ . Plug-in estimators of the mean and variance were provided and used to demonstrate plug-in estimators’ potential to be biased.

$\hat{\mu} = \mathbb{E}_{\hat{F}_n}[X] = \sum_{i=1}^{n} X_i P(X = X_i) = \frac{1}{n} \sum_{i=1}^{n} X_i = \bar{X}_{n}$

$\hat{\sigma}^{2} = \mathbb{E}_{\hat{F}_{n}}[(X- \mathbb{E}_{\hat{F}_n}[X])^2] = \mathbb{E}_{\hat{F}_n}[(X - \bar{X}_{n})^2] = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X}_{n})^2.$

Statistical functionals that meet the following two criteria represent a special family of functionals known as expectation functionals:

1) $T(F)$ is the expectation of a function $g$ with respect to the distribution function $F$ ; and

$T(F) = \mathbb{E}_{F} ~g(X)$

2) the function $g(\cdot)$ takes the form of a symmetric kernel.

Expectation functionals encompass many common parameters and are well-behaved. Plug-in estimators of expectation functionals, named V-statistics after von Mises, can be obtained but may be biased. It is, however, always possible to construct an unbiased estimator of expectation functionals regardless of the underlying distribution function $F$ . These estimators are named U-statistics, with the “U” standing for unbiased.

This blog post provides 1) the definitions of symmetric kernels and expectation functionals; 2) an overview of plug-in estimators of expectation functionals or V-statistics; 3) an overview of unbiased estimators for expectation functionals or U-statistics.

Continue reading U-, V-, and Dupree statistics

Plug-in estimators of statistical functionals

Consider a sequence of $n$ independent and identically distributed random variables $X_1, X_2, …, X_n \sim F$ . The distribution function $F$ is unknown but belongs to a known set of distribution functions $\mathcal{F}$ . In parametric estimation, $\mathcal{F}$ may represent a family of distributions specified by a vector of parameters, such as $(\mu, \sigma)$ in the case of the location-scale family. In nonparametric estimation, $\mathcal{F}$ is much more broad and is subject to milder restrictions, such as the existence of moments or continuity. For example, we may define $\mathcal{F}$ as the family of distributions for which the mean exists or all distributions defined on the real line $\mathbb{R}$ .

As mentioned in my previous blog post comparing nonparametric and parametric estimation, a statistical functional is any real-valued function of the cumulative distribution function $F$ , denoted $\theta = T(F)$ . Statistical functionals can be thought of as characteristics of $F$ , and include moments

$T(F) = \mathbb{E}_{F}[X^{k}]$

and quantiles

$T(F) = F^{-1}(p)$

as examples.

An infinite population may be considered as completely determined by its distribution function, and any numerical characteristic of an infinite population with distribution function $F$ that is used in statistics is a [statistical] functional of $F$ .

Wassily Hoeffding. “A Class of Statistics with Asymptotically Normal Distribution.” Ann. Math. Statist. 19 (3) 293 – 325, September, 1948.

This blog post aims to provide insight into estimators of statistical functionals based on a sample of $n$ independent and identically random variables, known as plug-in estimators or empirical functionals.

Continue reading Plug-in estimators of statistical functionals