Sen (1960) proved that U-statistics could be decomposed into identically distributed and *asymptotically uncorrelated* “structural components.”

The mean of these structural components is equivalent to the U-statistic and the variance of the structural components can be used to estimate the variance of the U-statistic, bypassing the need for often challenging derivation of conditional variance components.

## One-sample U-statistics

### Review of known properties

Consider an independent and identically distributed sample of random variables: . Then, a one-sample U-statistic with a symmetric kernel consisting of arguments takes the form

where the sum extends over all distinct combinations of of the random variables. U-statistic theory then tells us that

where

For large samples, the variance of can be approximated by

**For further information on one-sample U-statistics, including derivation of variance estimates, see: Getting to know U: the asymptotic distribution of a single U-statistic.**

Commonly, or . When , the form of the U-statistic reduces to

so that takes the form of the mean of the kernel across the observations and,

resembles the variance of a sample mean.

When ,

or equivalently,

and,

**For examples of how to derive for common one-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.**

### Structural components

In 1960, Sen demonstrated that a one-sample U-statistic based on a sample of size could be decomposed into *structural components*, denoted for . Sen also proved that the are identically distributed and *asymptotically uncorrelated*.

When , the structural component is simply

or the kernel evaluated at the observation .

When ,

or the mean of all kernel terms involving .

The mean of the structural components is equivalent to ,

Sen also proved that the first conditional variance component can be consistently estimated as the sample variance of the components,

such that

That is, when and when .

### Example: Variance

As an example, consider the one-sample U-statistic for sample variance. I previously used U-statistic theory to derive the limiting distribution of this U-statistic in a previous blog post, One, Two, U: Examples of common one- and two-sample U-statistics.

In summary, the U-statistic for the unbiased sample variance takes the form,

where is an unbiased, symmetric kernel for .

Using U-statistic theory, the asymptotic distribution of this estimator is given by

where is the fourth central moment. Note that this variance considers both variance components and .

However, recall that with large , the variance of can be approximated using only the first conditional variance component such that,

where represents the number of arguments within the kernel , here and

so that

Following Equation (2.1) of Sen (1960), the structural component of is defined in this scenario as,

or the average of the kernel terms involving (as not compared with itself within ).

Then,

as expected!

Since can be expressed as a mean of the structural components which are asymptotically uncorrelated, it follows that the variance of can also be estimated by considering

such that

Now, let’s simulate a large dataset to explore!

```
set.seed(12345)
# Sim n=1000 Xi from N(0, 10)
m <- 1000
mu <- 0
sigma <- 10
X <- rnorm(n = m, mean=mu, sd=sigma)
# Parameter values
# True value of fourth central moment of N(mu, sigma) = 3*sigma^4
mu4 <- 3*sigma^4
sigma4 <- sigma^4
# Variance approximation using sigma_1^2
true_VarU <- (mu4 - sigma4)/m
true_VarU
```

## [1] 20

```
# Estimates
U <- var(X)
Xbar <- mean(X)
Xbar4 <- 1/(m-1) * sum((X-Xbar)^4)
# Approximated variance using estimates
est_VarU <- (Xbar4 - U^2)/m
est_VarU
```

## [1] 19.62728

The true (approximate) variance of , is 20, and the corresponding sample estimate is 19.63.

Next, lets construct the structural components of and check that their mean is equal to .

```
kernel <- function(x1, x2){
(x1 - x2)^2/2
}
V <- c()
for (i in 1:m){
V[i] <- 1/(m-1) * sum(kernel(X[i], X[-i]))
}
U == mean(V)
```

## [1] TRUE

Awesome, the mean of the structural components 99.75 is exactly equal to the value of our U-statistic 99.75 as expected.

Next, let’s compare the sample variance of the structural components to the true and estimated variance of constructed with only the first conditional variance component…

```
sm2 <- var(V)
est_VarV <- sm2*4/m
true_VarU; est_VarU; est_VarV
```

## [1] 20

## [1] 19.62728

## [1] 19.67656

Huzzah, we see that all three quantities are very similar indeed!

## Two-sample U-statistics

### Review of known properties

Sen (1960) also demonstrated that this result can be extended to multiple independent i.i.d. samples. We focus on 2 independent samples: and in which case the two-sample U-statistic takes the general form,

where and represent the number of arguments within the kernel for each sample, respectively. The variance of the two-sample U-statistic for large samples is provided by

**For further information on two-sample U-statistics, including a sketch derivation of variance estimates, see: Much Two U About Nothing: Extension of U-statistics to multiple independent samples.**

Focusing on the common scenario in which , the two-sample U-statistic reduces to

and

**For examples of how to derive and for common two-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.**

### Structural components

With two samples, structural components and their variance are constructed for each sample. Within the first sample,

and within the second sample,

The mean of the structural components within either sample or is equal to and

where and are the estimated variances of the structural components within the first and second sample, respectively.

### Example: Mann-Whitney U-statistic when F=G

Consider the form of the Mann-Whitney U-statistic for two independent random samples of a continuous outcome so that ,

Thus, the form of the structural components within the first sample is

and within the second sample,

The Mann-Whitney U-statistic is commonly used to conduct testing of the null hypothesis that the distribution of the outcome within each group is the same, or . In a previous blog post, it was demonstrated that under this null hypothesis,

and

where .

Let’s conduct another small simulation to see how the true variance compares to the estimated variance using the structural components!

```
m = 500
n = 1000
mu = 0
sigma = 10
# Simulate X and Y so that H0: F = G is true
X <- rnorm(m, mean = mu, sd = sigma)
Y <- rnorm(n, mean = mu, sd = sigma)
kernel <- function(xi, yj){
ifelse(xi < yj, 1, 0)
}
# Compute kernel value for each pair of Xi and Yj
k <- matrix(NA, nrow=m, ncol=n)
for (i in 1:m){
for (j in 1:n){
k[i, j] <- kernel(X[i], Y[j])
}
}
VarU <- (m+n+1)/(12*m*n)
VarU
```

## [1] 0.0002501667

```
U <- sum(k) / length(k)
Vi <- 1/n * rowSums(k)
Vj <- 1/m * colSums(k)
mean(Vi) ; mean(Vj)
```

## [1] 0.501896

## [1] 0.501896

```
sm2 <- var(Vi)
sn2 <- var(Vj)
VarV <- sm2/m + sn2/n
VarU; VarV
```

## [1] 0.0002501667

## [1] 0.0002545398

Again, we see the estimated variance using the structural components is very close to the true variance!

## Concluding thoughts

The structural components introduced by Sen (1960) significantly reduce the complexity of variance estimation for U-statistics. The structural components, however, are only asymptotically uncorrelated and performance may suffer for small samples (not investigated here). However, nonparametric estimation suffers in general with small samples as the empirical cumulative distribution, the focus of nonparametric estimation, may be significantly biased (e.g. due to “unlucky” or unrepresentative sample). Thus, these approaches may be most suitable for reasonably sized samples.

Sen, P. K. (1960). *On Some Convergence Properties of U-Statistics. Calcutta Statistical Association Bulletin, 10(1-2), 1–18.* doi:10.1177/0008068319600101

Thanks very much for this post. I’ve been looking for this result for quite a while! It is not very well known, and deserves to be in every book that covers U-statistics. I’m curious how you found it?

Hi Glen,

I’m glad you found the blog post helpful. 🙂

I’m part of a research group working on estimating nonparametric treatment effects in two-arm clinical trials.

The effect we are interested in is equivalent to the area under the receiver operating curve, for which estimation has long been of interest to Radiology.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. doi:10.1148/radiology.143.1.7063747This result was employed by

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics, 44(3), 837. doi:10.2307/2531595to derive variance and covariance estimators for multiple AUCs, and seems to have been rediscovered several times in different ways!

Cheers,

Emma