After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.
Overview
A statistical functional is any real-valued function of a distribution function such that
and represents characteristics of the distribution and include the mean, variance, and quantiles.
Often times is unknown but is assumed to belong to a broad class of distribution functions
subject only to mild restrictions such as continuity or existence of specific moments.
A random sample can be used to construct the empirical cumulative distribution function (ECDF)
,
which assigns mass to each
.
is a valid, discrete CDF which can be substituted for
to obtain
. These estimators are referred to as plug-in estimators for obvious reasons.
For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!
Many statistical functionals take the form of an expectation of a real-valued function with respect to
such that for
,
When is a function symmetric in its arguments such that, for e.g.
, it is referred to as a symmetric kernel of degree
. If
is not symmetric, a symmetric equivalent
can always be found,
where represents the set of all permutations of the indices
.
A statistical functional belongs to a special family of expectation functionals when:
, and
is a symmetric kernel of degree
.
Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,
so that is the average of
evaluated at all possible permutations of size
from
. Since the
can appear more than once within each summand,
is generally biased.
By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.
Since is symmetric, we can require that
, resulting in
combinations of the subscripts
. The U-statistic is then the average of
evaluated at all
distinct combinations of
,
While within each summand now, each
still appears in multiple summands, suggesting that
is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of
.
For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!
This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic .
Continue reading Getting to know U: the asymptotic distribution of a single U-statistic