U-, V-, and Dupree statistics

To start, I apologize for this blog’s title but I couldn’t resist referencing to the Owen Wilson classic You, Me, and Dupree ā€“ wow! The other gold-plated candidate was U-statistics and You. Please, please, hold your applause.


My previous blog post defined statistical functionals as any real-valued function of an unknown CDF, T(F), and explained how plug-in estimators could be constructed by substituting the empirical cumulative distribution function (ECDF) \hat{F}_{n} for the unknown CDF F. Plug-in estimators of the mean and variance were provided and used to demonstrate plug-in estimators’ potential to be biased.

    \[ \hat{\mu} = \mathbb{E}_{\hat{F}_n}[X] = \sum_{i=1}^{n} X_i P(X = X_i) = \frac{1}{n} \sum_{i=1}^{n} X_i = \bar{X}_{n} \]

    \[ \hat{\sigma}^{2} = \mathbb{E}_{\hat{F}_{n}}[(X- \mathbb{E}_{\hat{F}_n}[X])^2] = \mathbb{E}_{\hat{F}_n}[(X - \bar{X}_{n})^2] = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X}_{n})^2. \]

Statistical functionals that meet the following two criteria represent a special family of functionals known as expectation functionals:

1) T(F) is the expectation of a function g with respect to the distribution function F; and

    \[ T(F) = \mathbb{E}_{F} ~g(X)\]

2) the function g(\cdot) takes the form of a symmetric kernel.

Expectation functionals encompass many common parameters and are well-behaved. Plug-in estimators of expectation functionals, named V-statistics after von Mises, can be obtained but may be biased. It is, however, always possible to construct an unbiased estimator of expectation functionals regardless of the underlying distribution function F. These estimators are named U-statistics, with the ā€œUā€ standing for unbiased.

This blog post provides 1) the definitions of symmetric kernels and expectation functionals; 2) an overview of plug-in estimators of expectation functionals or V-statistics; 3) an overview of unbiased estimators for expectation functionals or U-statistics.

Continue reading U-, V-, and Dupree statistics

Parametric vs. Nonparametric Approach to Estimations

Parametric statistics assume that the unknown CDF F belongs to a family of CDFs characterized by a parameter (vector) \theta. As the form of F is assumed, the target of estimation is its parameters \theta. Thus, all uncertainty about F is comprised of uncertainty about its parameters. Parameters are estimated by \hat{\theta}, and estimates are be substituted into the assumed distribution to conduct inference for the quantities of interest. If the assumed distribution F is incorrect, inference may also be inaccurate, or trends in the data may be missed.

To demonstrate the parametric approach, consider n = 100 independent and identically distributed random variables X_1, …, X_n generated from an exponential distribution with rate \lambda = 2. Investigators wish to estimate the 75^{th} percentile and erroneously assume that their data is normally distributed. Thus, F is assumed to be the Normal CDF but \mu and \sigma^2 are unknown. The parameters \mu and \sigma are estimated in their typical way by \bar{x} and \sigma^2, respectively. Since the normal distribution belongs to the location-scale family, an estimate of the p^{th} percentile is provided by,

    \[x_p = \bar{x} + s\Phi^{-1}(p)\]

where \Phi^{-1} is the standard normal quantile function, also known as the probit.

set.seed(12345)
library(tidyverse, quietly = T)
# Generate data from Exp(2)
x <- rexp(n = 100, rate = 2)

# True value of 75th percentile with rate = 2
true <- qexp(p = 0.75, rate = 2) 
true
## [1] 0.6931472
# Estimate mu and sigma
xbar <- mean(x)
s    <- sd(x)

# Estimate 75th percentile assuming mu = xbar and sigma = s
param_est <- xbar + s * qnorm(p = 0.75)
param_est
## [1] 0.8792925

The true value of the 75^{th} percentile of \text{Exp}(2) is 0.69 while the parametric estimate is 0.88.

Nonparametric statistics make fewer distributions about the unknown distribution F, requiring only mild assumptions such as continuity or the existence of specific moments. Instead of estimating parameters of F, F itself is the target of estimation. F is commonly estimated by the empirical cumulative distribution function (ECDF) \hat{F},

    \[\hat{F}(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbb{I}(X_i \leq x).\]

Any statistic that can be expressed as a function of the CDF, known as a statistical functional and denoted \theta = T(F), can be estimated by substituting \hat{F} for F. That is, plug-in estimators can be obtained as \hat{\theta} = T(\hat{F}).

Continue reading Parametric vs. Nonparametric Approach to Estimations

Kernel Density Estimation

Motivation

It is important to have an understanding of some of the more traditional approaches to function estimation and classification before delving into the trendier topics of neural networks and decision trees. Many of these methods build on an understanding of each other and thus to truly be a MACHINE LEARNING MASTER, we’ve got to pay our dues. We will therefore start with the slightly less sexy topic of kernel density estimation.

Let X be a random variable with a continuous distribution function (CDF) F(x) = Pr(X \leq x) and probability density function (PDF)

    \[f(x) = \frac{d}{dx} F(x)\]

Our goal is to estimate f(x) from a random sample \lbrace X_1, …, X_n \rbrace. Estimation of f(x) has a number of applications including construction of the popular Naive Bayes classifier,

    \[ \hat{Pr}(C = c | X = x_0) = \frac{\hat{\pi}_c \hat{f}_{c}(x_0)}{\sum_{k=1}^{C} \hat{\pi}_{k} \hat{f}_{k}(x_0)} \]

Continue reading Kernel Density Estimation