Resampling, the jackknife, and pseudo-observations

Resampling methods approximate the sampling distribution of a statistic or estimator. In essence, a sample taken from the population is treated as a population itself. A large number of new samples, or resamples, are taken from this “new population”, commonly with replacement, and within each of these resamples, the estimate of interest is re-obtained. A large number of these estimate replicates can then be used to construct the empirical sampling distribution from which confidence intervals, bias, and variance may be estimated. These methods are particularly advantageous for statistics or estimators for which no standard methods apply or are difficult to derive.

The jackknife is a popular resampling method, first introduced by Quenouille in 1949 as a method of bias estimation. In 1958, jackknifing was both named by Tukey and expanded to include variance estimation. A jackknife is a multipurpose tool, similar to a swiss army knife, that can get its user out of tricky situations. Efron later developed the arguably most popular resampling method, the bootstrap, in 1979 after being inspired by the jackknife.

In Efron’s (1982) book The jackknife, the bootstrap, and other resampling plans, he states,

Good simple ideas, of which the jackknife is a prime example, are our most precious intellectual commodity, so there is no need to apologize for the easy mathematical level.

Despite existing since the 1940’s, resampling methods were infeasible due to the computational power required to perform resampling and recalculate estimates many times. With today’s computing power, the uncomplicated yet powerful jackknife, and resampling methods more generally, should be a tool in every analyst’s toolbox.

Leave-one-out jackknife algorithm

The use of different resampling schemes results in different resampling methods. For example, bootstrap methods commonly sample n observations with replacement from the original sample of size n, repeating this a large number of B times to obtain a large number of replicates.

The leave-one-out (LOO) jackknife creates n samples of size (n-1) from the original sample of size n by deleting each observation in turn. The leave-one-out jackknife is similar to the concept of leave-one-out crossvalidation encountered in prediction modelling or machine learning. More than one observation may be deleted at a time, though less common, in which case the method may be referred to as the leave-K-out jackknife.

The leave-one-out jackknife algorithm proceeds generally as follows:

  1. Estimate the parameter or estimand of interest \theta using the original sample of size n, referring to the resulting estimate as the full-sample estimate and denoting it by \hat{\theta}.
  2. Construct n jackknife samples of size (n-1) by deleting each observation in turn. That is,

    (1)   \begin{equation*} \mathbf{x}_{i} = \lbrace x_1, x_2, ..., x_{(i-1)}, x_{(i+1)}, ..., x_n \rbrace \>\> (i=1,...,n). \end{equation*}

  3. Estimate the parameter or estimand of interest within each jackknife sample, yielding n jackknife replicates,

    (2)   \begin{equation*} \mathbf{\hat{\theta}} = \lbrace \hat{\theta}_{(1)}, \hat{\theta}_{(2)}, ..., \hat{\theta}_{(n)} \rbrace \end{equation*}

    where subscript (i) is used to indicate exclusion of the i^{th} sample observation.

  4. Obtain the jackknife estimate \hat{\theta}_{(\cdot)} as the mean of the jackknife replicates,

    (3)   \begin{equation*} \hat{\theta}_{(\cdot)} = \frac{1}{n} \sum_{i=1}^{n} \hat{\theta}_{(i)}. \end{equation*}

Bias estimation and adjustment

As noted, the jackknife was introduced by Quenouille in 1949 as a method of estimating bias.

Suppose \hat{\theta} is the full-sample estimate based on the entire sample of size n and \hat{\theta}_{(\cdot)} is our jackknife estimate obtained according to the algorithm in the previous section. The jackknife estimate of the bias, also known as Quenouille’s estimate, is then obtained as

(4)   \begin{equation*} \widehat{\text{bias}}(\hat{\theta}) = (n-1) \left[ \hat{\theta}_{(\cdot)} - \hat{\theta} \right]. \end{equation*}

A bias-corrected jackknife estimate is then obtained as,

(5)   \begin{equation*} \hat{\theta}_{jack} = \hat{\theta} - \widehat{\text{bias}}(\hat{\theta}) = n \hat{\theta} - (n-1) \hat{\theta}_{(\cdot)}. \end{equation*}

If the full-sample estimate is unbiased such that \mathbb{E}[\hat{\theta}] = \theta, it follows that \widehat{\text{bias}}(\hat{\theta}) = 0 and the bias-corrected jackknife estimate \hat{\theta}_{jack} is equivalent to the full-sample estimate \hat{\theta} and jackknifing provides no advantages with respect to bias.

If the full-sample estimate is biased such that \mathbb{E}[\hat{\theta}] \neq \theta, the bias-corrected jackknife estimate will always reduce the bias compared to the full-sample estimate! However, the bias will not always be reduced to zero depending on its structure.

Suppose we assume that for a sample of size N, X_{1}, ..., X_{N} \stackrel{iid}{\sim} F, the full-sample estimate \hat{\theta} is biased such that its expectation takes the form,

(6)   \begin{equation*} \mathbb{E}_{F}\left[ \hat{\theta} \right] = \theta + \frac{a_1(F)}{n} + \frac{a_2(F)}{n^2} + ...= E_{n} \end{equation*}

where the a_1, a_2,... do not depend on the sample size n. Then, since each jackknife replicate is constructed using a jackknife sample of size n-1, it follows that the expectation of the jackknife estimate is given by,

(7)   \begin{align*} \mathbb{E}_{F}\left[ \hat{\theta}_{(\cdot)} \right] &= \frac{1}{n} \sum_{i=1}^{n} \mathbb{E}_{F} \left[ \hat{\theta}_{(i)} \right] \\ &= \frac{1}{n} \sum_{i=1}^{n} \left[ \theta + \frac{a_1(F)}{n-1} + \frac{a_2(F)}{(n-1)^2} + ... \right] \\ &= E_{n-1}. \end{align*}

That is, the expectation of the jackknife estimate is equivalent to the expectation of estimator of interest based on a sample of size n-1. The expected difference between the full-sample estimate and the jackknife estimate is then,

(8)   \begin{align*} \mathbb{E}_{F}\left[ \hat{\theta}_{(\cdot)} - \hat{\theta} \right] = E_{n-1} - E_{n} \approx \frac{a_1(F)}{n(n-1)}. \end{align*}

If the assumed bias structure is correct, \widehat{\text{bias}}(\hat{\theta}) approximates the first bias term of \hat{\theta}. However, if the bias structure features higher order bias terms, extensions of the jackknife, such as Quenouille’s second-order jackknife, may be required.

Pseudo-observations and variance estimation

Tukey (1958) noticed that the bias-corrected jackknife estimate could be expressed as the mean of N pseudo-observations, defined for the i^{th} sample observation as,

(9)   \begin{equation*} \hat{\theta}_{i} = n \hat{\theta} - (n-1) \hat{\theta}_{(i)} \end{equation*}

such that,

(10)   \begin{equation*} \hat{\theta}_{jack} = \frac{1}{n} \sum_{i=1}^{n} \hat{\theta}_{i} \end{equation*}

The i^{th} pseudo-observation can be viewed or interpretted as the i^{th} individual’s contribution to the estimate.

Tukey further conjectured that pseudo-observations could be treated as independent and identically distributed random variables. When true, estimating the variance of the bias-corrected jackknife estimate \hat{\theta}_{jack} is equivalent to estimating the variance of mean of the pseudo-observations. That is,

(11)   \begin{equation*} \widehat{\text{Var}}(\hat{\theta}_{jack}) = \frac{s^{2}_{jack}}{n} = \frac{1}{n(n-1)} \sum_{i=1}^{n} \left(\hat{\theta}_{i} - \hat{\theta}_{jack}\right)^2. \end{equation*}

If \hat{\theta}_{jack} is assumed to be asymptotically Normally distributed, confidence intervals for \theta can be constructed in the usual manner as,

(12)   \begin{equation*} \hat{\theta}_{jack} \mp t_{\alpha/2, n-1} \sqrt{\widehat{\text{Var}}(\hat{\theta}_{jack})}. \end{equation*}

Tukey’s conjectures were published in a short abstract with no supporting proofs. Miller (1968, 1974) later demonstrated that the properties conjectured only held for particular classes of statistics. However, this class of statistics does appear to include expectation functionals or linear functionals (among others)!

For more detail on expectation functionals and their estimators, check out my blog post U-, V-, and Dupree statistics.

  • Tukey, J. W. (1958). Bias and confidence in not-quite large samples (Abstract). The Annals of Mathematical Statistics, 29(2), 614. doi:10.1214/aoms/1177706647
  • Miller, R. G. (1968). Jackknifing Variances. The Annals of Mathematical Statistics, 39(2), 567–582. doi:10.1214/aoms/1177698418
  • Miller, R. G. (1974). The Jackknife–A Review. Biometrika, 61(1), 1-15. doi:10.2307/2334280]

Example: Biased variance estimator

Data generation and estimator properties

In the following section, we demonstrate properties of the discussed jackknife estimators and pseudo-observations for the biased estimator of the variance with denominator n,

(13)   \begin{equation*} s^{2}_{n} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2. \end{equation*}

The expectation of s^{2}_{n} takes the form of E_{n} described within the Bias section,

(14)   \begin{equation*} \mathbb{E}[s^{2}_{n}] = \left( \frac{n-1}{n} \right) \sigma^2 = \sigma^2 - \frac{\sigma^2}{n} \end{equation*}

with a_1(F) = - \sigma^2, and it follows that,

(15)   \begin{equation*} \text{bias}(s^{2}_{n}) = s^2_{n} - \mathbb{E}[s^{2}_{n}] = -\frac{\sigma^2}{n} \end{equation*}

When the x_i are independent and identically distributed according to \mathcal{N}(\mu, \sigma),

(16)   \begin{equation*} \text{Var}[s_n^2] = \frac{2 (n-1)}{n^2} \sigma^4 \end{equation*}

Here, we generate a sample of n=100 independent and identically distributed observations generated according to \mathcal{N}(\mu=100, \sigma^2=49).

set.seed(12345)

n <- 100
mu <- 100
sigma <- 7

x <- rnorm(n, mu, sigma)

For reference, the first five generated observations are: 104.1, 105, 99.2, 96.8, 104.2.

A histogram of all n=100 generated observations is provided below.

qplot(x, geom='histogram', bins=5) + theme_bw()

plot of chunk unnamed-chunk-2

It follows that the true properties of s^{2}_n in this scenario are as follows.

trueBias <- -sigma^2/n
trueBias
## [1] -0.49
trueVar  <- 2 * (n-1) * sigma^4 / n^2
trueVar
## [1] 47.5398

That is, \text{bias}(s^{2}_n) = -0.49 and \text{Var}[s^{2}_n] = 47.54.

Leave-one-out jackknife algorithm

First, we obtain the full-sample estimate by applying a given Estimator to the sample x.

getFullEst <- function(x, Estimator){
  Estimator(x)
}

In this example, we are using the biased variance estimator so that Estimator = bVar.

bVar <- function(x){
  n <- length(x)
  bVar<- 1/n * sum((x-mean(x))^2)
  return(bVar)
}
FullEst <- getFullEst(x, bVar)

Thus, the full-sample estimate is \hat{\theta} = s^{2}_{n} = 60.28.

Second, we use a loop to construct a list of the n jackknife Samples of size (n-1) by deleting each observation in turn.

getSamples <- function(x){
  n <- length(x)
  Samples <- list()
  for(i in 1:n){
    Samples[[i]] <- x[-i]
  }
  return(Samples)
}
Samples <- getSamples(x)

Inspecting the first five observations of the first sample, we can see that x_1 = 104.1 has been deleted: 105, 99.2, 96.8, 104.2, 87.3, 104.4.

Third, we use sapply to apply Estimator to each of the n Samples and store the n resulting \hat{\theta}_{(i)} in a vector Replicates.

getReplicates <- function(Samples, Estimator){
  Replicates <- sapply(Samples, FUN=Estimator, simplify=T)
  return(Replicates)
}
Replicates <- getReplicates(Samples, bVar)

The first ten replicates, \hat{\theta}_{(1)}, ..., \hat{\theta}_{(10)}, are: 60.83, 60.78, 60.83, 60.64, 60.82, 58.76, 60.81, 60.75, 60.75, 60.21.

Fourth, we take the average of all jackknife Replicates to obtain the jackknife estimate JackEst.

getJackEst <- function(Replicates){
  n <- length(Replicates)
  JackEst <- mean(Replicates)
  return(list(JackEst=JackEst, n=n))
}
JackEst <- getJackEst(Replicates)

The resulting value of \hat{\theta}_{(\cdot)} = 60.27.

Bias estimation and adjustment

To obtain a bias-corrected jackknife estimate, \hat{\theta}_{jack}, we first need to obtain an estimate of the bias.

getBias <- function(FullEst, JackEst){
  n <- JackEst$n
  JackEst <- JackEst$JackEst
  Bias <- (n-1) * (JackEst - FullEst)
  return(Bias)
}
Bias <- getBias(FullEst, JackEst)

The estimated bias of the full-sample estimate \widehat{\text{bias}}(\hat{\theta}) = \widehat{\text{bias}}(s^{2}_{n}) is -0.609. Recall that the true bias \text{bias}(\hat{\theta}) is -0.49.

getAdjEst <- function(FullEst, Bias){
  AdjEst <- FullEst - Bias
  return(AdjEst)
}
AdjEst <- getAdjEst(FullEst, Bias)

Subtracting the estimated bias from the full-sample estimate yields a bias-adjusted jackknife estimate \hat{\theta}_{jack} = 60.89. Recall that the true value of \sigma^2 = 49 and the full-sample estimate \hat{\theta} = 60.28.

Pseudo-observations and variance estimation

To estimate the variance of our bias-adjusted jackknife estimate \hat{\theta}_{jack}, we first need to transform each observation x_i into a pseudo-observation \hat{\theta}_{i}.

getPseudoObs <- function(FullEst, Replicates){
  n <- length(Replicates)
  PseudoObs <- n*FullEst - (n-1)*Replicates
  return(PseudoObs)
}
PseudoObs <- getPseudoObs(FullEst, Replicates)

The first five pseudoobservations \hat{\theta}_1, ..., \hat{\theta}_5 are: 5.73, 10.67, 6.22, 24.16, 6.44.

Finally, we obtain an estimate of the variance of \hat{\theta}_{jack} as the sample variance of the sample mean of the pseudo-observations.

getVarAdj <- function(PseudoObs){
  n <- length(PseudoObs)
  Var <- (1/n) * var(PseudoObs)
  return(Var)
}
VarAdj <- getVarAdj(PseudoObs) 

The estimated variance of the bias-adjusted jackknife estimate \widehat{\text{Var}}(\hat{\theta}_{jack}) is 53.89. Recall that the true value \text{Var}[s^{2}_{n}] = 47.54.

Summary

Based on our single sample of n=100 i.i.d observations from \mathcal{N}(\mu=100, \sigma^2=49), the jackknife appears to have provided a reasonable estimate of the parameter of interest \sigma^2 = 49 (est. = 60.89), the bias of the estimator = -0.49 (est. -0.61), and the variance of the estimator = 47.54 (est. = 53.89).

Discrepancies between true and estimated values are likely due to sampling variation (and the use of a single sample to evaluate the methods). If we repeat this process for 1,000 samples of size n=100 from \mathcal{N}(\mu = 100, \sigma^{2}=49) and average the results across the samples, we would expect close approximation of the true values.

results <- data.frame(Bias=NA, AdjEst=NA, VarAdj=NA)

n_sim <- 1000
for(s in 1:n_sim){
  x <- rnorm(n, mu, sigma)
  
  FullEst <- getFullEst(x, bVar)
  Samples <- getSamples(x)
  Replicates <- getReplicates(Samples, bVar)
  JackEst <- getJackEst(Replicates)
  Bias <- getBias(FullEst, JackEst)
  AdjEst <- getAdjEst(FullEst, Bias)
  PseudoObs <- getPseudoObs(FullEst, Replicates)
  VarAdj <- getVarAdj(PseudoObs) 
  
  results[s,] <- c(Bias, AdjEst, VarAdj)
}
round(colMeans(results), 2)
##   Bias AdjEst VarAdj 
##  -0.49  49.01  48.32

Indeed, we recover the true value of the estimator, its bias, and variance almost exactly!

Published by

Emma Davies Smith

Emma Davies Smith is currently a Postdoctoral Research Fellow in Biostatistics at the Harvard T. H. Chan School of Public Health. Her current research interests include randomized controlled trials, correlated data, clinical epidemiology, and causal inference. When she's not working on expanding her knowledge of statistics, she's busy petting cats and unsuccessfully convincing her husband to let her adopt them, hiking, and concocting indie and folk rock playlists.

6 thoughts on “Resampling, the jackknife, and pseudo-observations”

  1. Hello! Very helpful post. Quick question to make sure I understand the utility of this approach in applied work where the true parameters are unknown. Is the utility of the jack-knife that I can apply this approach to any standard model (e.g., regression) in a single sample and report the average of the jack-knifed results, that mean should be a better estimate of the true, un-biased effects?

    1. Hi Daniel,

      Thanks for the great question. 😊
      I think that the bias reduction property of jackknife estimators is interesting. However, I actually think the real value of the jackknife in modern applications is the construction of pseudo-observations and their use in variance estimation.

      In the case of regression, I’m not so sure the jackknife with respect to bias would provide many advantages since most coefficient estimators are unbiased.
      However, for estimands without regression frameworks, pseudo-observations can be treated as the response in a GLM to adjust for covariates and estimate treatment effects!
      This paper provides some cool examples of the application of pseudo-observations in survival analysis:
      Andersen, P. K., & Pohar Perme, M. (2009). Pseudo-observations in survival analysis. Statistical Methods in Medical Research, 19(1), 71–99. doi:10.1177/0962280209105020

      I’m hoping to follow-up this blog post with another discussing some applications of pseudo-observations… eventually…

      Cheers,
      Emma

Leave a Reply

Your email address will not be published. Required fields are marked *