empirical CDF Archives • Statisticelle

To start, I apologize for this blog’s title but I couldn’t resist referencing to the Owen Wilson classic You, Me, and Dupree – wow! The other gold-plated candidate was U-statistics and You. Please, please, hold your applause.

My previous blog post defined statistical functionals as any real-valued function of an unknown CDF, $T(F)$ , and explained how plug-in estimators could be constructed by substituting the empirical cumulative distribution function (ECDF) $\hat{F}_{n}$ for the unknown CDF $F$ . Plug-in estimators of the mean and variance were provided and used to demonstrate plug-in estimators’ potential to be biased.

$\hat{\mu} = \mathbb{E}_{\hat{F}_n}[X] = \sum_{i=1}^{n} X_i P(X = X_i) = \frac{1}{n} \sum_{i=1}^{n} X_i = \bar{X}_{n}$

$\hat{\sigma}^{2} = \mathbb{E}_{\hat{F}_{n}}[(X- \mathbb{E}_{\hat{F}_n}[X])^2] = \mathbb{E}_{\hat{F}_n}[(X - \bar{X}_{n})^2] = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X}_{n})^2.$

Statistical functionals that meet the following two criteria represent a special family of functionals known as expectation functionals:

1) $T(F)$ is the expectation of a function $g$ with respect to the distribution function $F$ ; and

$T(F) = \mathbb{E}_{F} ~g(X)$

2) the function $g(\cdot)$ takes the form of a symmetric kernel.

Expectation functionals encompass many common parameters and are well-behaved. Plug-in estimators of expectation functionals, named V-statistics after von Mises, can be obtained but may be biased. It is, however, always possible to construct an unbiased estimator of expectation functionals regardless of the underlying distribution function $F$ . These estimators are named U-statistics, with the “U” standing for unbiased.

This blog post provides 1) the definitions of symmetric kernels and expectation functionals; 2) an overview of plug-in estimators of expectation functionals or V-statistics; 3) an overview of unbiased estimators for expectation functionals or U-statistics.

Continue reading U-, V-, and Dupree statistics

Consider a sequence of $n$ independent and identically distributed random variables $X_1, X_2, …, X_n \sim F$ . The distribution function $F$ is unknown but belongs to a known set of distribution functions $\mathcal{F}$ . In parametric estimation, $\mathcal{F}$ may represent a family of distributions specified by a vector of parameters, such as $(\mu, \sigma)$ in the case of the location-scale family. In nonparametric estimation, $\mathcal{F}$ is much more broad and is subject to milder restrictions, such as the existence of moments or continuity. For example, we may define $\mathcal{F}$ as the family of distributions for which the mean exists or all distributions defined on the real line $\mathbb{R}$ .

As mentioned in my previous blog post comparing nonparametric and parametric estimation, a statistical functional is any real-valued function of the cumulative distribution function $F$ , denoted $\theta = T(F)$ . Statistical functionals can be thought of as characteristics of $F$ , and include moments

$T(F) = \mathbb{E}_{F}[X^{k}]$

and quantiles

$T(F) = F^{-1}(p)$

as examples.

An infinite population may be considered as completely determined by its distribution function, and any numerical characteristic of an infinite population with distribution function $F$ that is used in statistics is a [statistical] functional of $F$ .

Wassily Hoeffding. “A Class of Statistics with Asymptotically Normal Distribution.” Ann. Math. Statist. 19 (3) 293 – 325, September, 1948.

This blog post aims to provide insight into estimators of statistical functionals based on a sample of $n$ independent and identically random variables, known as plug-in estimators or empirical functionals.

Continue reading Plug-in estimators of statistical functionals