Thank you very much to the lovely Feben Alemu for pointing me in the direction of https://pungenerator.org/ as a means of ensuring we never have to go without a brilliant title! With great power comes great responsibility.
Review
Statistical functionals are any real-valued function of a distribution function , . When is unknown, nonparametric estimation only requires that belong to a broad class of distribution functions , typically subject only to mild restrictions such as continuity or existence of specific moments.
For a single independent and identically distributed random sample of size , , a statistical functional is said to belong to the family of expectation functionals if:
- takes the form of an expectation of a function with respect to ,
- is a symmetric kernel of degree .
A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree , is symmetric if .
If is an expecation functional and the class of distribution functions is broad enough, an unbiased estimator of can always be constructed. This estimator is known as a U-statistic and takes the form,
such that is the average of evaluated at all distinct combinations of size from .
For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.
Since each appears in more than one summand of , the central limit theorem cannot be used to derive the limiting distribution of as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that is in fact asymptotically normal with mean
and variance
where
The sketch of the proof is as follows:
- Express the variance of in terms of the covariance of its summands,
- Recognize that if two terms share common elements such that,
conditioning on their shared elements will make the two terms independent.
- For , define
such that
and
Note that when , and , and when , and .
- Use the law of iterated expecation to demonstrate that
and re-express as the sum of the ,
Recognizing that the first variance term dominates for large , approximate as
- Identify a surrogate that has the same mean and variance as but is the sum of independent terms,
so that the central limit may be used to show
- Demonstrate that and converge in probability,
and thus have the same limiting distribution so that
For a walkthrough derivation of the limiting distribution of for a single sample, check out my blog post Getting to know U: the asymptotic distribution of a single U-statistic.
This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.
Continue reading Much Two U About Nothing: Extension of U-statistics to multiple independent samples