After my last grand slam title, U-, V-, and Dupree statistics I was really feeling the pressure to keep my title game strong. Thank you to my wonderful friend Steve Lee for suggesting this beautiful title.
A statistical functional is any real-valued function of a distribution function such that
and represents characteristics of the distribution and include the mean, variance, and quantiles.
Often times is unknown but is assumed to belong to a broad class of distribution functions subject only to mild restrictions such as continuity or existence of specific moments.
A random sample can be used to construct the empirical cumulative distribution function (ECDF) ,
which assigns mass to each .
is a valid, discrete CDF which can be substituted for to obtain . These estimators are referred to as plug-in estimators for obvious reasons.
For more details on statistical functionals and plug-in estimators, you can check out my blog post Plug-in estimators of statistical functionals!
Many statistical functionals take the form of an expectation of a real-valued function with respect to such that for ,
When is a function symmetric in its arguments such that, for e.g. , it is referred to as a symmetric kernel of degree . If is not symmetric, a symmetric equivalent can always be found,
where represents the set of all permutations of the indices .
A statistical functional belongs to a special family of expectation functionals when:
- , and
- is a symmetric kernel of degree .
Plug-in estimators of expectation functionals are referred to as V-statistics and can be expressed explicitly as,
so that is the average of evaluated at all possible permutations of size from . Since the can appear more than once within each summand, is generally biased.
By restricting the summands to distinct indices only an unbiased estimator known as a U-statistic arises. In fact, when the family of distributions is large enough, it can be shown that a U-statistic can always be constructed for expectation functionals.
Since is symmetric, we can require that , resulting in combinations of the subscripts . The U-statistic is then the average of evaluated at all distinct combinations of ,
While within each summand now, each still appears in multiple summands, suggesting that is the sum of correlated terms. As a result, the central limit theorem cannot be relied upon to determine the limiting distribution of .
For more details on expectation functionals and their estimators, you can check out my blog post U-, V-, and Dupree statistics!
This blog post provides a walk-through derivation of the limiting, or asymptotic, distribution of a single U-statistic .