# U-, V-, and Dupree statistics

To start, I apologize for this blog’s title but I couldn’t resist referencing to the Owen Wilson classic You, Me, and Dupree – wow! The other gold-plated candidate was U-statistics and You. Please, please, hold your applause.

My previous blog post defined statistical functionals as any real-valued function of an unknown CDF, , and explained how plug-in estimators could be constructed by substituting the empirical cumulative distribution function (ECDF) for the unknown CDF . Plug-in estimators of the mean and variance were provided and used to demonstrate plug-in estimators’ potential to be biased.

Statistical functionals that meet the following two criteria represent a special family of functionals known as expectation functionals:

1) is the expectation of a function with respect to the distribution function ; and

2) the function takes the form of a symmetric kernel.

Expectation functionals encompass many common parameters and are well-behaved. Plug-in estimators of expectation functionals, named V-statistics after von Mises, can be obtained but may be biased. It is, however, always possible to construct an unbiased estimator of expectation functionals regardless of the underlying distribution function . These estimators are named U-statistics, with the “U” standing for unbiased.

This blog post provides 1) the definitions of symmetric kernels and expectation functionals; 2) an overview of plug-in estimators of expectation functionals or V-statistics; 3) an overview of unbiased estimators for expectation functionals or U-statistics.

## Kernels, degree, and symmetry

Consider a real-valued function such that for random variables ,

is referred to as a kernel of degree and is by definition, an unbiased estimator of . is said to be symmetric in its arguments, or symmetric, if it is invariant to permutation of its arguments . For example, a kernel of degree 2 is symmetric if .

If is not symmetric, a function can always be found that is symmetric. As a result of the ‘s being identically distributed, they may be considered “exchangeable” such that for any permutation of the random variables,

There are possible permutations of ‘s arguments. Then, since is an unbiased estimator of , the average of across all permutations of its arguments,

is both an unbiased estimator for and symmetric in its arguments such that

Thus, without loss of generality, the kernel may always be assumed to be symmetric.

As an example, is a symmetric kernel of degree 2 since and .

On the other hand, is an unbiased estimator of such that

but it is not symmetric as . The corresponding symmetric kernel of degree 2 is then

# Expectation functionals

Any statistical functionals that can be expressed as the expectation of a symmetric kernel of degree with respect to

represent a special family known as expectation functionals or regular functionals. For a refresher on what means, refer to my previous blog post on plug-in estimators!

Common examples of expectation functionals include moments,

variance,

and covariance,

When , expectation functionals may also be referred to as linear functionals since for some mixture ,

# V-statistics

V-statistics are plug-in estimators of expectation functionals such that for ,

As noted previously, the empirical cumulative distribution function defined as

is a valid distribution function which assigns mass to each . That is, for a random variable ,

Now, consider such random variables . Then, there are a total of equally-likely realizations of . For example, all could equal , all could equal , or anything in between such as .

The are independent and thus the plug-in estimator of an expectation functional is equal to

Since is discrete and the support of each is the random sample , the plug-in estimator can be explicitly expressed as,

so that is the sample average of evaluated at all possible realizations of , or equivalently, the sample average of evaluated at all permutations of size from .

# Bias of V-statistics

When , the V-statistic takes the form of a traditional sample mean,

which is unbiased and its asymptotic distribution is provided by the central limit theorem,

Now, consider the form of when ,

Notice that the sum contains terms for which . We can expand the sum to make these terms explicit,

There are terms for which and terms for which . Taking the expectation of with respect to yields,

Since and are independent when , by definition such that,

Clearly and thus is a biased estimator of . is generally biased for as subscript duplication () results in dependence within its summands. Note however, the bias of approaches 0 as .

As an example, consider the plug-in estimator for the variance using the symmetric kernel we defined above,

Expanding the square and splitting the sum into and ,

The second sum is clearly zero and we are left with,

Now, taking the expectation of with respect to ,

Since the are identically distributed, let’s drop the subscripts and aggregate terms recalling that there are terms for which ,

Recalling that , simplifying leaves us with

which takes the general form of we derived previously.

# U-statistics

Since is biased as a result of subscript duplication, a possible solution would be to restrict the sums to distinct combinations of subscripts . We can require that as a result of the kernel ‘s symmetry. Thus, there are such subscript combinations to consider. Let represent the set of all subscript combinations. Then, the resulting estimator of is the U-statistic

or equivalently,

Now that all subscripts within the summands are distinct, we have

so that is unbiased for , hence the name!

Returning to the variance example, the corresponding U-statistic is

Taking the expectation of with respect to yields,

Since the are identically distributed, dropping the subscripts and aggregating terms gives,

so that the U-statistic is unbiased for the population variance as desired.

It can be shown that U-statistics are asymptotically normal. However, the central limit cannot be used to prove this result. Each appears in more than one summand of , making the sum of dependent terms. As a result, a clever technique known as H-projection, named after Wassily Hoeffding, is required.