Sen (1960) proved that U-statistics could be decomposed into identically distributed and asymptotically uncorrelated “structural components.”
The mean of these structural components is equivalent to the U-statistic and the variance of the structural components can be used to estimate the variance of the U-statistic, bypassing the need for often challenging derivation of conditional variance components.
One-sample U-statistics
Review of known properties
Consider an independent and identically distributed sample of random variables: . Then, a one-sample U-statistic with a symmetric kernel consisting of arguments takes the form
where the sum extends over all distinct combinations of of the random variables. U-statistic theory then tells us that
where
For large samples, the variance of can be approximated by
For further information on one-sample U-statistics, including derivation of variance estimates, see: Getting to know U: the asymptotic distribution of a single U-statistic.
Commonly, or . When , the form of the U-statistic reduces to
so that takes the form of the mean of the kernel across the observations and,
resembles the variance of a sample mean.
When ,
or equivalently,
and,
For examples of how to derive for common one-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.
Structural components
In 1960, Sen demonstrated that a one-sample U-statistic based on a sample of size could be decomposed into structural components, denoted for . Sen also proved that the are identically distributed and asymptotically uncorrelated.
When , the structural component is simply
or the kernel evaluated at the observation .
When ,
or the mean of all kernel terms involving .
The mean of the structural components is equivalent to ,
Sen also proved that the first conditional variance component can be consistently estimated as the sample variance of the components,
such that
That is, when and when .
Example: Variance
As an example, consider the one-sample U-statistic for sample variance. I previously used U-statistic theory to derive the limiting distribution of this U-statistic in a previous blog post, One, Two, U: Examples of common one- and two-sample U-statistics.
In summary, the U-statistic for the unbiased sample variance takes the form,
where is an unbiased, symmetric kernel for .
Using U-statistic theory, the asymptotic distribution of this estimator is given by
where is the fourth central moment. Note that this variance considers both variance components and .
However, recall that with large , the variance of can be approximated using only the first conditional variance component such that,
where represents the number of arguments within the kernel , here and
so that
Following Equation (2.1) of Sen (1960), the structural component of is defined in this scenario as,
or the average of the kernel terms involving (as not compared with itself within ).
Then,
as expected!
Since can be expressed as a mean of the structural components which are asymptotically uncorrelated, it follows that the variance of can also be estimated by considering
such that
Now, let’s simulate a large dataset to explore!
set.seed(12345)
# Sim n=1000 Xi from N(0, 10)
m <- 1000
mu <- 0
sigma <- 10
X <- rnorm(n = m, mean=mu, sd=sigma)
# Parameter values
# True value of fourth central moment of N(mu, sigma) = 3*sigma^4
mu4 <- 3*sigma^4
sigma4 <- sigma^4
# Variance approximation using sigma_1^2
true_VarU <- (mu4 - sigma4)/m
true_VarU
## [1] 20
# Estimates
U <- var(X)
Xbar <- mean(X)
Xbar4 <- 1/(m-1) * sum((X-Xbar)^4)
# Approximated variance using estimates
est_VarU <- (Xbar4 - U^2)/m
est_VarU
## [1] 19.62728
The true (approximate) variance of , is 20, and the corresponding sample estimate is 19.63.
Next, lets construct the structural components of and check that their mean is equal to .
kernel <- function(x1, x2){
(x1 - x2)^2/2
}
V <- c()
for (i in 1:m){
V[i] <- 1/(m-1) * sum(kernel(X[i], X[-i]))
}
U == mean(V)
## [1] TRUE
Awesome, the mean of the structural components 99.75 is exactly equal to the value of our U-statistic 99.75 as expected.
Next, let’s compare the sample variance of the structural components to the true and estimated variance of constructed with only the first conditional variance component…
sm2 <- var(V)
est_VarV <- sm2*4/m
true_VarU; est_VarU; est_VarV
## [1] 20
## [1] 19.62728
## [1] 19.67656
Huzzah, we see that all three quantities are very similar indeed!
Two-sample U-statistics
Review of known properties
Sen (1960) also demonstrated that this result can be extended to multiple independent i.i.d. samples. We focus on 2 independent samples: and in which case the two-sample U-statistic takes the general form,
where and represent the number of arguments within the kernel for each sample, respectively. The variance of the two-sample U-statistic for large samples is provided by
For further information on two-sample U-statistics, including a sketch derivation of variance estimates, see: Much Two U About Nothing: Extension of U-statistics to multiple independent samples.
Focusing on the common scenario in which , the two-sample U-statistic reduces to
and
For examples of how to derive and for common two-sample U-statistics, see: One, Two, U: Examples of common one- and two-sample U-statistics.
Structural components
With two samples, structural components and their variance are constructed for each sample. Within the first sample,
and within the second sample,
The mean of the structural components within either sample or is equal to and
where and are the estimated variances of the structural components within the first and second sample, respectively.
Example: Mann-Whitney U-statistic when F=G
Consider the form of the Mann-Whitney U-statistic for two independent random samples of a continuous outcome so that ,
Thus, the form of the structural components within the first sample is
and within the second sample,
The Mann-Whitney U-statistic is commonly used to conduct testing of the null hypothesis that the distribution of the outcome within each group is the same, or . In a previous blog post, it was demonstrated that under this null hypothesis,
and
where .
Let’s conduct another small simulation to see how the true variance compares to the estimated variance using the structural components!
m = 500
n = 1000
mu = 0
sigma = 10
# Simulate X and Y so that H0: F = G is true
X <- rnorm(m, mean = mu, sd = sigma)
Y <- rnorm(n, mean = mu, sd = sigma)
kernel <- function(xi, yj){
ifelse(xi < yj, 1, 0)
}
# Compute kernel value for each pair of Xi and Yj
k <- matrix(NA, nrow=m, ncol=n)
for (i in 1:m){
for (j in 1:n){
k[i, j] <- kernel(X[i], Y[j])
}
}
VarU <- (m+n+1)/(12*m*n)
VarU
## [1] 0.0002501667
U <- sum(k) / length(k)
Vi <- 1/n * rowSums(k)
Vj <- 1/m * colSums(k)
mean(Vi) ; mean(Vj)
## [1] 0.501896
## [1] 0.501896
sm2 <- var(Vi)
sn2 <- var(Vj)
VarV <- sm2/m + sn2/n
VarU; VarV
## [1] 0.0002501667
## [1] 0.0002545398
Again, we see the estimated variance using the structural components is very close to the true variance!
Concluding thoughts
The structural components introduced by Sen (1960) significantly reduce the complexity of variance estimation for U-statistics. The structural components, however, are only asymptotically uncorrelated and performance may suffer for small samples (not investigated here). However, nonparametric estimation suffers in general with small samples as the empirical cumulative distribution, the focus of nonparametric estimation, may be significantly biased (e.g. due to “unlucky” or unrepresentative sample). Thus, these approaches may be most suitable for reasonably sized samples.
Sen, P. K. (1960). On Some Convergence Properties of U-Statistics. Calcutta Statistical Association Bulletin, 10(1-2), 1–18. doi:10.1177/0008068319600101
Thanks very much for this post. I’ve been looking for this result for quite a while! It is not very well known, and deserves to be in every book that covers U-statistics. I’m curious how you found it?
Hi Glen,
I’m glad you found the blog post helpful. 🙂
I’m part of a research group working on estimating nonparametric treatment effects in two-arm clinical trials.
The effect we are interested in is equivalent to the area under the receiver operating curve, for which estimation has long been of interest to Radiology.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. doi:10.1148/radiology.143.1.7063747
This result was employed by
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics, 44(3), 837. doi:10.2307/2531595
to derive variance and covariance estimators for multiple AUCs, and seems to have been rediscovered several times in different ways!
Cheers,
Emma