Nonparametric neighbours: U-statistic structural components and jackknife pseudo-observations for the AUC

Two of my recent blog posts focused on two different, but as we will see related, methods which essentially transform observed responses into a summary of their contribution to an estimate: structural components resulting from Sen’s (1960) decomposition of U-statistics and pseudo-observations resulting from application of the leave-one-out jackknife. As I note in this comment, I think the real value of deconstructing estimators in this way results from the use of these quantities, which in special (but common) cases are asymptotically uncorrelated and identically distributed, to: (1) simplify otherwise complex variance estimates and construct interval estimates, and (2) apply regression methods to estimators without an existing regression framework.

As discussed by Miller (1974), pseudo-observations may be treated as approximately independent and identically distributed random variables when the quantity of interest is a function of the mean or variance, and more generally, any function of a U-statistic. Several other scenarios where these methods are applicable are also outlined. Many estimators of popular “parameters” can actually be expressed as U-statistics. Thus, these methods are quite broadly applicable. A review of basic U-statistic theory and some common examples, notably the difference in means or the Wilcoxon Mann-Whitney test statistic, can be found within my blog post: One, Two, U: Examples of common one- and two-sample U-statistics.

As an example of use case (1), Delong et al. (1988) used structural components to estimate the variances and covariances of the areas under multiple, correlated receiver operator curves or multiple AUCs. Hanley and Hajian-Tilaki (1997) later referred to the methods of Delong et al. (1988) as “the cleanest and most elegant approach to variances and covariances of AUCs.” As an example of use case (2), Andersen & Pohar Perme (2010) provide a thorough summary of how pseudo-observations can be used to construct regression models for important survival parameters like survival at a single time point and the restricted mean survival time.

Now, structural components are restricted to U-statistics while pseudo-observations may be used more generally, as discussed. But, if we construct pseudo-observations for U-statistics, one of several “valid” scenarios, what is the relationship between these two quantities? Hanley and Hajian-Tilaki (1997) provide a lovely discussion of the equivalence of these two methods when applied to the area under the receiver operating characteristic curve or simply the AUC. This blog post follows their discussion, providing concrete examples of computing structural components and pseudo-observations using R, and demonstrating their equivalence in this special case.

Simplifying U-statistic variance estimation with Sen’s structural components

Sen (1960) proved that U-statistics could be decomposed into identically distributed and asymptotically uncorrelated “structural components.”

The mean of these structural components is equivalent to the U-statistic and the variance of the structural components can be used to estimate the variance of the U-statistic, bypassing the need for often challenging derivation of conditional variance components.

One, Two, U: Examples of common one- and two-sample U-statistics

My previous two blog posts revolved around derivation of the limiting distribution of U-statistics for one sample and multiple independent samples.

The notation within these derivations can get quite complicated and it may be a bit unclear as to how to actually derive components of the limiting distribution.

In this blog post, I provide two examples of both common one-sample U-statistics (Variance, Kendall’s Tau) and two-sample U-statistics (Difference of two means, Wilcoxon Mann-Whitney rank-sum statistic) and derive their limiting distribution using our previously developed theory.

Much Two U About Nothing: Extension of U-statistics to multiple independent samples

Statistical functionals are any real-valued function of a distribution function F, \theta = T(F). When F is unknown, nonparametric estimation only requires that F belong to a broad class of distribution functions \mathcal{F}, typically subject only to mild restrictions such as continuity or existence of specific moments.

For a single independent and identically distributed random sample of size n, X_1, …, X_n \stackrel{i.i.d}{\sim} F, a statistical functional \theta = T(F) is said to belong to the family of expectation functionals if:

  1. T(F) takes the form of an expectation of a function \phi with respect to F,

        \[T(F) = \mathbb{E}_F~ \phi(X_1, …, X_a) \]

  2. \phi(X_1, …, X_a) is a symmetric kernel of degree a \leq n.

A kernel is symmetric if its arguments can be permuted without changing its value. For example, if the degree a = 2, \phi is symmetric if \phi(x_1, x_2) = \phi(x_2, x_1).

If \theta = T(F) is an expecation functional and the class of distribution functions \mathcal{F} is broad enough, an unbiased estimator of \theta = T(F) can always be constructed. This estimator is known as a U-statistic and takes the form,

    \[ U_n = \frac{1}{{n \choose a}} \mathop{\sum … \sum} \limits_{1 \leq i_1 < ... < i_a \leq n} \phi(X_{i_1}, ..., X_{i_a})\]

such that U_n is the average of \phi evaluated at all {n \choose a} distinct combinations of size a from X_1, …, X_n.

Since each X_i appears in more than one summand of U_n, the central limit theorem cannot be used to derive the limiting distribution of U_n as it is the sum of dependent terms. However, clever conditioning arguments can be used to show that U_n is in fact asymptotically normal with mean

    \[\mathbb{E}_F~ U_n = \theta = T(F)\]

and variance

    \[\text{Var}_F~U_n = \frac{a^2}{n} \sigma_1^{2}\]


    \[\sigma_1^{2} = \text{Var}_F \Big[ \mathbb{E}_F [\phi(X_1, …, X_a)|X_1] \Big].\]

The sketch of the proof is as follows:

  1. Express the variance of U_n in terms of the covariance of its summands,

    \[\text{Var}_{F}~ U_n = \frac{1}{{n \choose a}^2} \mathop{\sum \sum} \limits_{\substack{1 \leq i_1 < ... < i_{a} \leq n \\ 1 \leq j_1 < ... < j_{a} \leq n}} \text{Cov}\left[\phi(X_{i_1}, ..., X_{i_a}),~ \phi(X_{j_1}, ..., X_{j_a})\right].\]

  1. Recognize that if two terms share c common elements such that,

        \[ \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    conditioning on their c shared elements will make the two terms independent.

  2. For 0 \leq c \leq n, define

        \[\phi_c(X_1, …, X_c) = \mathbb{E}_F \Big[\phi(X_1, …, X_a) | X_1, …, X_c \Big] \]

    such that

        \[\mathbb{E}_F~ \phi_c(X_1, …, X_c) = \theta = T(F)\]


        \[\sigma_{c}^2 = \text{Var}_{F}~ \phi_c(X_1, …, X_c).\]

    Note that when c = 0, \phi_0 = \theta and \sigma_0^2 = 0, and when c=a, \phi_a = \phi(X_1, …, X_a) and \sigma_a^2 = \text{Var}_F~\phi(X_1, …, X_a).

  3. Use the law of iterated expecation to demonstrate that

        \[ \sigma^{2}_c = \text{Cov} [\phi(X_1, …, X_c, X_{c+1}, …, X_a), \phi(X_1, …, X_c, X'_{c+1}, …, X'_a)] \]

    and re-express \text{Var}_{F}~U_n as the sum of the \sigma_{c}^2,

        \[ \text{Var}_F~U_n = \frac{1}{{n \choose a}} \sum_{c=1}^{a} {a \choose c}{n-a \choose a-c} \sigma^{2}_c.\]

    Recognizing that the first variance term dominates for large n, approximate \text{Var}_F~ U_n as

        \[\text{Var}_F~U_n \sim \frac{a^2}{n} \sigma^{2}_1.\]

  4. Identify a surrogate U^{*}_n that has the same mean and variance as U_n-\theta but is the sum of independent terms,

        \[ U_n^{*} = \sum_{i=1}^{n} \mathbb{E}_F [U_n - \theta|X_i] \]

    so that the central limit may be used to show

        \[ \sqrt{n} U_n^{*} \rightarrow N(0, a^2 \sigma_1^2).\]

  5. Demonstrate that U_n - \theta and U_n^{*} converge in probability,

        \[ \sqrt{n} \Big((U_n - \theta) - U_n^{*}\Big) \stackrel{P}{\rightarrow} 0 \]

    and thus have the same limiting distribution so that

        \[\sqrt{n} (U_n - \theta) \rightarrow N(0, a^2 \sigma_1^2).\]

This blog post aims to provide an overview of the extension of kernels, expectation functionals, and the definition and distribution of U-statistics to multiple independent samples, with particular focus on the common two-sample scenario.

