Simplifying U-statistic variance estimation with Sen’s structural components

Sen (1960) proved that U-statistics could be decomposed into identically distributed and asymptotically uncorrelated “structural components.”

The mean of these structural components is equivalent to the U-statistic and the variance of the structural components can be used to estimate the variance of the U-statistic, bypassing the need for often challenging derivation of conditional variance components.

One-sample U-statistics

Review of known properties

Consider an independent and identically distributed sample of m random variables: X_{1}, …, X_{m} \stackrel{i.i.d.}{\sim} F. Then, a one-sample U-statistic with a symmetric kernel \phi consisting of a arguments takes the form

    \[ U = \frac{1}{{m \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_{a} \leq m} \phi(X_{i_1}, ..., X_{i_a}) \]

where the sum extends over all {m \choose a} distinct combinations of a of the m random variables. U-statistic theory then tells us that

    \[ \text{Var}_{F}~U = \frac{1}{{m \choose a}} \sum_{c=1}^{a} {a \choose c} {m-a \choose a-c} \sigma^{2}_c \]

where

    \[ \sigma^{2}_{c} = \text{Var}_F \left[ \mathbb{E}_{F} \left( \phi(X_1, ..., X_a) | X_1, ..., X_c \right) \right]. \]

For large samples, the variance of U can be approximated by

    \[ \text{Var}~U \approx \frac{a^2}{m} \sigma_1^2. \]

For further information on one-sample U-statistics, including derivation of variance estimates, see: Getting to know U: the asymptotic distribution of a single U-statistic.

Commonly, a=1 or a=2. When a=1, the form of the U-statistic reduces to

    \[ U = \frac{1}{m} \sum_{i=1}^{m} \phi(X_i) \]

so that U takes the form of the mean of the kernel across the n observations and,

    \[ \text{Var} ~U = \frac{\sigma_1^{2}}{m} \]

resembles the variance of a sample mean.

When a=2,

    \[ U = \frac{1}{{m \choose 2}} \mathop{\sum \sum} \limits_{1 \leq i_1 < i_{2} \leq m} \phi(X_{i_1}, X_{i_2}) \]

or equivalently,

    \[ U = \frac{1}{m(m-1)} \mathop{\sum \sum} \limits_{i_1 \neq i_{2}} \phi(X_{i_1}, X_{i_2}) \]

and,

    \[ \text{Var} ~U = \frac{2}{m(m-1)} \left[ 2(m-2)\sigma_1^2 + \sigma_2^2 \right] \approx \frac{4}{m} \sigma_1^2. \]

For examples of how to derive \sigma_c^2 for common one-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.

Structural components

In 1960, Sen demonstrated that a one-sample U-statistic based on a sample of size m could be decomposed into m structural components, denoted V_i for i = 1, …, m. Sen also proved that the V_i are identically distributed and asymptotically uncorrelated.

When a=1, the i^{th} structural component is simply

    \[ V_i = \phi(x_i) \]

or the kernel evaluated at the i^{th} observation X_i.

When a=2,

    \[ V_i = \frac{1}{m-1} \sum_{\substack{i_1 = 1 \ i_1 \neq i}}^{m} \phi(x_i, X_{i_1}) \]

or the mean of all (m-1) kernel terms involving X_i.

The mean of the m structural components is equivalent to U,

    \[ U = \frac{1}{m} \sum_{i=1}^{m} V_i = \overline{V}_m \]

Sen also proved that the first conditional variance component \sigma_1^2 can be consistently estimated as the sample variance of the components,

    \[ s_{m}^2 = \frac{1}{m-1} \sum_{i=1}^{m} (V_i - U)^2 \]

such that

    \[ \text{Var}~ U = \frac{a^2}{m} s_m^{2}. \]

That is, \text{Var}~U = s_m^{2}/m when a=1 and \text{Var}~U = 4 s_m^{2}/m when a=2.

Example: Variance

As an example, consider the one-sample U-statistic for sample variance. I previously used U-statistic theory to derive the limiting distribution of this U-statistic in a previous blog post, One, Two, U: Examples of common one- and two-sample U-statistics.

In summary, the U-statistic for the unbiased sample variance s^2 takes the form,

    \[ U = \frac{1}{{m \choose 2}} \mathop{\sum \sum} \limits_{1 \leq i < j \leq n} \frac{(X_{i} - X_{j})^2}{2} = \frac{1}{m(m-1)} \mathop{\sum \sum} \limits_{i \neq j} \frac{(X_{i} - X_{j})^2}{2}\]

where \phi(x_1, x_2) = (x_1 - x_2)^2/2 is an unbiased, symmetric kernel for \sigma^2.
Using U-statistic theory, the asymptotic distribution of this estimator U is given by

    \[ \sqrt{m} \left( U - \sigma^2 \right) \rightarrow N \left(0, \mu_4 - \sigma^4 \frac{(m-3)}{(m-1)} \right) \]

where \mu_4 is the fourth central moment. Note that this variance considers both variance components \sigma^2_1 and \sigma^2_2.

However, recall that with large m, the variance of U can be approximated using only the first conditional variance component \sigma^2_1 such that,

    \[ \text{Var}_{F} ~U \approx \frac{a^2}{m} \sigma^{2}_1 \]

where a represents the number of arguments within the kernel \phi, here a=2 and

    \[\sigma^2_1 = \frac{\mu_4 - \sigma^4}{4}\]

so that

    \[ \text{Var}_{F} ~U \approx \frac{\mu_4 - \sigma^4}{m}. \]

Following Equation (2.1) of Sen (1960), the i^{th} structural component of U_n is defined in this scenario as,

    \[ V_{i} = \frac{1}{m-1} \sum_{\substack{i_1 = 1 \ i_1 \neq i}}^{m} \frac{(x_i - X_{i_1})^2}{2} \]

or the average of the (m-1) kernel terms involving X_i (as X_i not compared with itself within U).
Then,

    \[ U = \frac{1}{m} \sum_{i=1}^{n} V_i = \frac{1}{m(m-1)} \sum_{i \neq j} \frac{(X_{i} - x_{j})^2}{2} \]

as expected!

Since U can be expressed as a mean of the structural components V_i which are asymptotically uncorrelated, it follows that the variance of U can also be estimated by considering

    \[ s_m^2 = \frac{1}{m-1} \sum_{i=1}^{m} \left(V_i - U \right)^2 \]

such that

    \[ \widehat{\text{Var}}~U \approx \frac{4}{m} s_{m}^{2}. \]

Now, let’s simulate a large dataset to explore!

set.seed(12345)

# Sim n=1000 Xi from N(0, 10)
m <- 1000
mu <- 0
sigma <- 10
X <- rnorm(n = m, mean=mu, sd=sigma)

# Parameter values
# True value of fourth central moment of N(mu, sigma) = 3*sigma^4
mu4 <- 3*sigma^4
sigma4 <- sigma^4

# Variance approximation using sigma_1^2
true_VarU <- (mu4 - sigma4)/m

true_VarU
## [1] 20
# Estimates
U <- var(X)

Xbar <- mean(X)
Xbar4 <- 1/(m-1) * sum((X-Xbar)^4)

# Approximated variance using estimates
est_VarU <- (Xbar4 - U^2)/m

est_VarU
## [1] 19.62728

The true (approximate) variance of U, \text{Var}~U \approx 4 \sigma_1^2/m is 20, and the corresponding sample estimate is 19.63.

Next, lets construct the n structural components V_i ~(i = 1, …, m) of U and check that their mean is equal to U.

kernel <- function(x1, x2){
  (x1 - x2)^2/2
}

V <- c()
for (i in 1:m){
  V[i] <- 1/(m-1) * sum(kernel(X[i], X[-i]))
}

U == mean(V)
## [1] TRUE

Awesome, the mean of the structural components \bar{V}= 99.75 is exactly equal to the value of our U-statistic U= 99.75 as expected.

Next, let’s compare the sample variance of the structural components to the true and estimated variance of U constructed with only the first conditional variance component…

sm2 <- var(V)
est_VarV <- sm2*4/m

true_VarU; est_VarU; est_VarV
## [1] 20
## [1] 19.62728
## [1] 19.67656

Huzzah, we see that all three quantities are very similar indeed!

Two-sample U-statistics

Review of known properties

Sen (1960) also demonstrated that this result can be extended to multiple independent i.i.d. samples. We focus on 2 independent samples: X_{1}, …, X_{m} \stackrel{i.i.d.}{\sim} F and Y_{1}, …, Y_{n} \stackrel{i.i.d.}{\sim} G in which case the two-sample U-statistic takes the general form,

    \[ U = \frac{1}{{m \choose a}{n \choose b}} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq m \\ 1 \leq j_1 < ... < j_b \leq n}} \phi(X_{i_1}, ..., X_{i_a}; Y_{j_1}, ..., Y_{j_b}) \]

where a and b represent the number of arguments within the kernel for each sample, respectively. The variance of the two-sample U-statistic for large samples is provided by

    \[ \text{Var}~U = \frac{a^2}{m} \sigma_{10}^2 + \frac{b^2}{n} \sigma_{01}^2. \]

For further information on two-sample U-statistics, including a sketch derivation of variance estimates, see: Much Two U About Nothing: Extension of U-statistics to multiple independent samples.

Focusing on the common scenario in which a=b=1, the two-sample U-statistic reduces to

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \phi(X_i; Y_j) \]

and

    \[ \text{Var}~U = \frac{\sigma_{10}^2}{m} + \frac{\sigma_{01}^{2}}{n}. \]

For examples of how to derive \sigma_{01}^2 and \sigma_{10}^2 for common two-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.

Structural components

With two samples, structural components and their variance are constructed for each sample. Within the first sample,

    \[ V_i = \frac{1}{n} \sum_{j=1}^{n} \phi(x_i; Y_j) \]

and within the second sample,

    \[ V_j = \frac{1}{m} \sum_{j=1}^{m} \phi(X_i; y_j). \]

The mean of the structural components within either sample \overline{V}_{X} or \overline{V}_{Y} is equal to U and

    \[ \widehat{\text{Var}}~U \approx \frac{s_m^{2}}{m} + \frac{s_n^{2}}{n} \]

where s_m^{2} and s_n^{2} are the estimated variances of the structural components within the first and second sample, respectively.

Example: Mann-Whitney U-statistic when F=G

Consider the form of the Mann-Whitney U-statistic for two independent random samples of a continuous outcome so that P(X=Y) = 0,

    \[ U = \frac{1}{mn} \sum_{i=1}^{m} \sum_{j=1}^{n} \mathbb{I}(X_i < Y_j). \]

Thus, the form of the structural components within the first sample is

    \[ V_i = \frac{1}{n} \sum_{j=1}^{n} \mathbb{I}(x_i < Y_j) \]

and within the second sample,

    \[ V_j = \frac{1}{m} \sum_{i=1}^{m} \mathbb{I}(X_i < y_j). \]

The Mann-Whitney U-statistic is commonly used to conduct testing of the null hypothesis that the distribution of the outcome within each group is the same, or H_0: F=G. In a previous blog post, it was demonstrated that under this null hypothesis,
\sigma_{10}^{2} = \sigma_{01}^{2} = \frac{1}{12}
and

    \[ \text{Var}~U = \frac{N}{12mn} \]

where N = m + n.

Let’s conduct another small simulation to see how the true variance compares to the estimated variance using the structural components!

m = 500
n = 1000
mu = 0
sigma = 10

# Simulate X and Y so that H0: F = G is true
X <- rnorm(m, mean = mu, sd = sigma)
Y <- rnorm(n, mean = mu, sd = sigma)

kernel <- function(xi, yj){
  ifelse(xi < yj, 1, 0)
}

# Compute kernel value for each pair of Xi and Yj
k <- matrix(NA, nrow=m, ncol=n)
for (i in 1:m){
  for (j in 1:n){
    k[i, j] <- kernel(X[i], Y[j])
  }
}

VarU <- (m+n+1)/(12*m*n)
VarU
## [1] 0.0002501667
U <- sum(k) / length(k)
Vi <- 1/n * rowSums(k)
Vj <- 1/m * colSums(k)

mean(Vi) ; mean(Vj)
## [1] 0.501896
## [1] 0.501896
sm2 <- var(Vi)
sn2 <- var(Vj)

VarV <- sm2/m + sn2/n

VarU; VarV
## [1] 0.0002501667
## [1] 0.0002545398

Again, we see the estimated variance using the structural components is very close to the true variance!

Concluding thoughts

The structural components introduced by Sen (1960) significantly reduce the complexity of variance estimation for U-statistics. The structural components, however, are only asymptotically uncorrelated and performance may suffer for small samples (not investigated here). However, nonparametric estimation suffers in general with small samples as the empirical cumulative distribution, the focus of nonparametric estimation, may be significantly biased (e.g. due to “unlucky” or unrepresentative sample). Thus, these approaches may be most suitable for reasonably sized samples.

Published by

Emma Smith

Emma Smith is a young statistician who's on a mission to convince the masses statistics is as awesome as she *knows* it is! When she's not working on expanding her knowledge of machine learning and mathematical statistics, she's busy petting cats and unsuccessfully convincing her boyfriend to let her adopt them, hiking, concocting indie and folk rock playlists, and kicking butt in roller derby.

Leave a Reply

Your email address will not be published. Required fields are marked *