To start, I apologize for this blog’s title but I couldn’t resist referencing to the Owen Wilson classic You, Me, and Dupree – wow! The other gold-plated candidate was U-statistics and You. Please, please, hold your applause.

My previous blog post defined statistical functionals as any real-valued function of an unknown CDF, $T(F)$ , and explained how plug-in estimators could be constructed by substituting the empirical cumulative distribution function (ECDF) $\hat{F}_{n}$ for the unknown CDF $F$ . Plug-in estimators of the mean and variance were provided and used to demonstrate plug-in estimators’ potential to be biased.

$\hat{\mu} = \mathbb{E}_{\hat{F}_n}[X] = \sum_{i=1}^{n} X_i P(X = X_i) = \frac{1}{n} \sum_{i=1}^{n} X_i = \bar{X}_{n}$

$\hat{\sigma}^{2} = \mathbb{E}_{\hat{F}_{n}}[(X- \mathbb{E}_{\hat{F}_n}[X])^2] = \mathbb{E}_{\hat{F}_n}[(X - \bar{X}_{n})^2] = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X}_{n})^2.$

Statistical functionals that meet the following two criteria represent a special family of functionals known as expectation functionals:

1) $T(F)$ is the expectation of a function $g$ with respect to the distribution function $F$ ; and

$T(F) = \mathbb{E}_{F} ~g(X)$

2) the function $g(\cdot)$ takes the form of a symmetric kernel.

Expectation functionals encompass many common parameters and are well-behaved. Plug-in estimators of expectation functionals, named V-statistics after von Mises, can be obtained but may be biased. It is, however, always possible to construct an unbiased estimator of expectation functionals regardless of the underlying distribution function $F$ . These estimators are named U-statistics, with the “U” standing for unbiased.

This blog post provides 1) the definitions of symmetric kernels and expectation functionals; 2) an overview of plug-in estimators of expectation functionals or V-statistics; 3) an overview of unbiased estimators for expectation functionals or U-statistics.

Kernels, degree, and symmetry

Consider a real-valued function $\phi(x_1, …, x_a)$ such that for random variables $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ ,

$\theta = T(F) = \mathbb{E}_{F} ~\phi(X_1, …, X_a) ~,~ 1 \leq a \leq n .$

$\phi(x_1, …, x_a)$ is referred to as a kernel of degree $a$ and is by definition, an unbiased estimator of $\theta$ . $\phi$ is said to be symmetric in its arguments, or symmetric, if it is invariant to permutation of its arguments $x_1, …, x_a$ . For example, a kernel of degree 2 is symmetric if $\phi(x_1, x_2) = \phi(x_2, x_1)$ .

If $\phi$ is not symmetric, a function $\phi^{*}$ can always be found that is symmetric. As a result of the $X_i$ ‘s being identically distributed, they may be considered “exchangeable” such that for any permutation $\pi = \{\pi(1), …, \pi(a)\}$ of the $a$ random variables,

$\mathbb{E}_{F}~\phi(X_1, …, X_a) = \mathbb{E}_{F}~\phi\left(X_{\pi(1)}, …. X_{\pi(a)}\right) .$

There are $a!$ possible permutations of $\phi$ ‘s $a$ arguments. Then, since $\phi$ is an unbiased estimator of $\theta$ , the average of $\phi$ across all $a!$ permutations of its arguments,

$\phi^{*}(x_1, …, x_a) = \frac{1}{a!} \sum_{\pi \in \Pi} \phi\left(x_{\pi(1)}, …, x_{\pi(a)}\right)$

is both an unbiased estimator for $\theta$ and symmetric in its arguments such that

$\mathbb{E}_{F} ~\phi(X_1, …, X_a) = \mathbb{E}_{F} ~\phi^{*}(X_1, …, X_a) = \theta.$

Thus, without loss of generality, the kernel $\phi$ may always be assumed to be symmetric.

As an example, $\phi(x_1, x_2) = x_1 x_2$ is a symmetric kernel of degree 2 since $\phi(x_2, x_1) = x_2 x_1 = x_1 x_2 = \phi(x_1, x_2)$ and $\mathbb{E}_{F} (X_1X_2) = \mu^2$ .

On the other hand, $\phi(x_1, x_2) = x_{1}^{2} - x_1 x_2$ is an unbiased estimator of $\sigma^2$ such that

$\mathbb{E}_{F}(X_1^2 - X_1 X_2) = \mathbb{E}_{F}(X_1^2) - \mu^2 = \sigma^2$

but it is not symmetric as $x_1^2 - x_1 x_2 \neq x_2^2 - x_2 x_1$ . The corresponding symmetric kernel of degree 2 $\phi^{*}(x_1, x_2)$ is then

$\phi^{*}(x_1, x_2) = \frac{1}{2} \left[\phi (x_1, x_2) + \phi(x_2, x_1)\right] = \frac{x_1^2 - 2x_1 x_2 + x_2^2}{2} = \frac{(x_1 - x_2)^2}{2}.$

Expectation functionals

Any statistical functionals that can be expressed as the expectation of a symmetric kernel of degree $a$ with respect to $F$

$T(F) = \mathbb{E}_{F} ~\phi(X_1, …, X_a)$

represent a special family known as expectation functionals or regular functionals. For a refresher on what $\mathbb{E}_F$ means, refer to my previous blog post on plug-in estimators!

Common examples of expectation functionals include moments,

$\mu_{r} = \mathbb{E}_{F}~X^{r} = \int x^{r} ~dF(x)$

variance,

$\sigma^2 = \int \int \frac{(x_1 - x_2)^2}{2} ~dF(x_1)~dF(x_2)$

and covariance,

$\sigma_{XY} = \int \int \frac{(x_1 - x_2)(y_1 - y_2)}{2} ~dF(x_1, y_1) ~dF(x_2, y_2).$

When $a = 1$ , expectation functionals may also be referred to as linear functionals since for some mixture $F_\alpha = \alpha F_1 + (1-\alpha) F_2, ~ \alpha \in (0,1)$ ,

$T (F_\alpha) = \int g(x) ~d\{\alpha F_1(x) + (1-\alpha) F_2(x) \} = \alpha \int g(x) ~dF_1(x) + (1-\alpha) \int g(x) ~dF_2(x)$

$T(F_\alpha) = \alpha~ T(F_1) + (1-\alpha)~ T(F_2).$

V-statistics

V-statistics are plug-in estimators of expectation functionals such that for $X_1, …, X_n \stackrel{i.i.d}{\sim} F$ ,

$V_n = T(\hat{F}_n) = \mathbb{E}_{\hat{F}_{n}} \phi(X_1, …, X_a) ~,~a \leq n.$

As noted previously, the empirical cumulative distribution function $\hat{F}_n(x)$ defined as

$\hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x)$

is a valid distribution function which assigns mass $\frac{1}{n}$ to each $X_i$ . That is, for a random variable $Y \sim \hat{F}_{n}$ ,

$P(Y = X_1) = P(Y = X_2) = … = P(Y = X_n) = \frac{1}{n}.$

Now, consider $a$ such random variables $Y_1, …, Y_a \stackrel{i.i.d}{\sim} \hat{F}_{n}$ . Then, there are a total of $n^{a}$ equally-likely realizations of $(Y_1, …, Y_a)$ . For example, all $Y_i$ could equal $X_1$ , all $Y_i$ could equal $X_n$ , or anything in between such as $(Y_1 = X_3, Y_2 = X_n, Y_3 = X_n, …, Y_a = X_4)$ .

The $Y_i$ are independent and thus the plug-in estimator of an expectation functional is equal to

$V_n = \mathbb{E}_{\hat{F}_{n}} ~\phi(Y_1, …, Y_a) = \int … \int \phi(Y_1, … Y_a) ~d\hat{F}_{n}(y_1)~… ~ d\hat{F}_{n}(y_n).$

Since $\hat{F}_n$ is discrete and the support of each $Y_i$ is the random sample $X_1, …, X_n$ , the plug-in estimator $V_n$ can be explicitly expressed as,

$V_n = \frac{1}{n^a} \sum_{i_1 = 1}^{n} \sum_{i_2 = 1}^{n} … \sum_{i_a = 1}^{n} \phi(X_{i_1}, X_{i_2}, …, X_{i_a})$

so that $V_n$ is the sample average of $\phi$ evaluated at all $n^a$ possible realizations of $(Y_1, …, Y_a)$ , or equivalently, the sample average of $\phi$ evaluated at all $n^{a}$ permutations of size $a$ from $X_1, …, X_n$ .

Bias of V-statistics

When $a = 1$ , the V-statistic takes the form of a traditional sample mean,

$V_n = \frac{1}{n} \sum_{i_1 = 1}^{n} \phi(X_{i_1})$

which is unbiased and its asymptotic distribution is provided by the central limit theorem,

$\sqrt{n} (V_n - T(F)) \rightarrow N(0, \sigma^2).$

Now, consider the form of $V_n$ when $a = 2$ ,

$V_n = \frac{1}{n^2} \sum_{i_1 = 1}^{n} \sum_{i_2 = 1}^{n} \phi(X_{i_1}, X_{i_2}).$

Notice that the sum contains terms for which $i_1 = i_2$ . We can expand the sum to make these terms explicit,

$V_n = \frac{1}{n^2} \left\{ \mathop{\sum\sum}\limits_{i_1 \neq i_2} \phi(X_{i_1}, X_{i_2}) + \sum_{i_1 = 1}^{n} \phi(X_{i_1}, X_{i_1})\right\}.$

There are $n(n-1)$ terms for which $i_1 \neq i_2$ and $n$ terms for which $i_1 = i_2$ . Taking the expectation of $V_n$ with respect to $F$ yields,

$\mathbb{E}_{F} V_n = \frac{n-1}{n} \mathbb{E}_{F} ~\phi(X_{i_1}, X_{i_2}) + \frac{1}{n} \mathbb{E}_{F} ~\phi(X_{i_1}, X_{i_1}).$

Since $X_{i_1}$ and $X_{i_2}$ are independent when $i_1 \neq i_2$ , $\mathbb{E}_{F} ~\phi(X_{i_1}, X_{i_2}) = T(F)$ by definition such that,

$\mathbb{E}_{F}~ V_n = \frac{n-1}{n} T(F) + \frac{1}{n} \mathbb{E}_{F}~ \phi(X_{i_1}, X_{i_1}).$

Clearly $\mathbb{E}_{F}~ V_n \neq T(F)$ and thus $V_n$ is a biased estimator of $T(F)$ . $V_n$ is generally biased for $a > 1$ as subscript duplication ( $i_{j} = i_{k}$ ) results in dependence within its summands. Note however, the bias of $V_n$ approaches 0 as $n \rightarrow \infty$ .

As an example, consider the plug-in estimator for the variance using the symmetric kernel we defined above,

$V_n = \frac{1}{n^2} \sum_{i_1 = 1}^{n} \sum_{i_2 = 1}^{n} \frac{(X_{i_1} - X_{i_2})^2}{2}.$

Expanding the square and splitting the sum into $i_1 \neq i_2$ and $i_1 = i_2$ ,

$V_n = \frac{1}{n^2} \left\{ \mathop{\sum\sum}\limits_{i_1 \neq i_2} \frac{X_{i_1}^2 - 2 X_{i_1} X_{i_2} + X_{i_2}^2}{2} + \sum_{i_1 = 1}^{n} \frac{2 X_{i_1}^2 - 2 X_{i_1}^2}{2}\right\} .$

The second sum is clearly zero and we are left with,

$V_n = \frac{1}{n^2} \mathop{\sum\sum}\limits_{i_1 \neq i_2} \frac{X_{i_1}^2 - 2 X_{i_1} X_{i_2} + X_{i_2}^2}{2}.$

Now, taking the expectation of $V_n$ with respect to $F$ ,

$\mathbb{E}_{F}~ V_n = \frac{1}{n^2} \mathop{\sum\sum}\limits_{i_1 \neq i_2} \frac{ \mathbb{E}_F [X_{i_1}^2] - 2 \mathbb{E}_F [X_{i_1}]~ \mathbb{E}_F[X_{i_2}] + \mathbb{E}_F [X_{i_2}^2]}{2}.$

Since the $X_i$ are identically distributed, let’s drop the subscripts and aggregate terms recalling that there are $n(n-1)$ terms for which $i_1 \neq i_2$ ,

$\mathbb{E}_{F}~ V_n = \frac{n(n-1)}{n^2} \frac{ 2\mathbb{E}_F [X^2] - 2 \mathbb{E}_F [X]^2}{2}.$

Recalling that $\sigma^2 = E[X^2] - E[X]^2$ , simplifying leaves us with

$\mathbb{E}_F ~V_n = \frac{(n-1)}{n} \sigma^2$

which takes the general form of $\mathbb{E}_F~ V_n$ we derived previously.

U-statistics

Since $V_n$ is biased as a result of subscript duplication, a possible solution would be to restrict the sums to distinct combinations of subscripts $(i_1, …, i_a)$ . We can require that $1 \leq i_1 < ... < i_a \leq n$ as a result of the kernel $\phi$ ‘s symmetry. Thus, there are ${n \choose a}$ such subscript combinations to consider. Let $\mathcal{C}$ represent the set of all ${n \choose a}$ subscript combinations. Then, the resulting estimator of $T(F)$ is the U-statistic

$U_n = \frac{1}{{n \choose a}} \sum_{c ~\in~ \mathcal{C}} \phi (X_{i_1}, ..., X_{i_a})$

or equivalently,

$U_n = \frac{1}{{n \choose a}} \mathop{\sum...\sum}\limits_{1 \leq i_1 < ... < i_a \leq n} \phi (X_{i_1}, ..., X_{i_a}).$

Now that all subscripts within the summands are distinct, we have

$\mathbb{E}_F~ U_n = \mathbb{E}_F~ \phi (X_{i_1}, ..., X_{i_a}) = T(F)$

so that $U_n$ is unbiased for $T(F)$ , hence the name!

Returning to the variance example, the corresponding U-statistic is

$U_n = \frac{1}{{n \choose 2}} \sum_{c ~\in~ \mathcal{C}} \frac{(X_{i_1} - X_{i_2})^2}{2} = \frac{1}{{n \choose 2}} \sum_{c ~\in~ \mathcal{C}} \frac{X_{i_1}^2 - 2 X_{i_1} X_{i_2} + X_{i_2}^2}{2}.$

Taking the expectation of $U_n$ with respect to $F$ yields,

$\mathbb{E}_F ~U_n = \frac{1}{{n \choose 2}} \sum_{c ~\in~ \mathcal{C}} \frac{ \mathbb{E}_F [X_{i_1}^2] - 2 \mathbb{E}_F [X_{i_1}]~ \mathbb{E}_F[X_{i_2}] + \mathbb{E}_F [X_{i_2}^2]}{2}.$

Since the $X_i$ are identically distributed, dropping the subscripts and aggregating terms gives,

$\mathbb{E}_F ~U_n = \mathbb{E}_F [X^2] - \mathbb{E}_F [X]^2 = \sigma^2$

so that the U-statistic is unbiased for the population variance as desired.

It can be shown that U-statistics are asymptotically normal. However, the central limit cannot be used to prove this result. Each $X_i$ appears in more than one summand of $U_n$ , making $U_n$ the sum of dependent terms. As a result, a clever technique known as H-projection, named after Wassily Hoeffding, is required.

Dupree statistics

$D_n =$

Download this blogpost as an RMarkdown (.Rmd) file!

U-, V-, and Dupree statistics

Kernels, degree, and symmetry

Expectation functionals

V-statistics

Bias of V-statistics

U-statistics

Dupree statistics

Published by

Emma Davies Smith

3 thoughts on “U-, V-, and Dupree statistics”

Leave a Reply Cancel reply