A journal article recently published in Clinical Trials caught my eye, How is missing data handled in cluster randomized trials? A review of trials published in the NIHR Journals Library 1997-2024. I immediately knew this was going to be a poor report card. Indeed, the abstract summarizes:

Among the 110 identified cluster randomized controlled trials, 45% (50/110) did not report or take any action on missing data in either primary analysis or sensitivity analysis. In total, 75% (82/110) of the identified cluster randomized controlled trials did not impute missing values in their primary analysis. Advanced methods like multiple imputation were applied in only 15% (16/110) of primary analyses and 28% (31/110) of sensitivity analyses. On the contrary, the review highlighted that missing data handling methods have evolved over time, with an increasing adoption of multiple imputation since 2017.

It is promising to see that adoption of multiple imputation is increasing, but almost half of all studied cluster trials did not address missing data at all! BIG YIKES?!? Honestly, I have to say I am not at all surprised. Missing data is hard. Cluster randomized trials are hard. Together…

Do you ever feel like you just can’t do anything right?! Well, *do I have* more problems for you…

Working on cluster randomized trials introduces a slew of additional considerations, even in the presence of “simpler” parallel two-arm designs with completely observed data. Do I have enough clusters in each arm to reliability estimate between-cluster variances? What about number of individuals per cluster to estimate within-cluster variances? What if I have repeated measures for each subject, so I have additional correlation to worry about? WHAT ABOUT MY TREATMENT EFFECT?!!! Should I be estimating cluster-level effects or individual-level effects? How do I incorporate cluster-level and individual-level covariates into my models? WHAT DO MY TREATMENT EFFECTS EVEN REPRESENT?!!!

If you are going to address missing data in your cluster trial, you need to make further considerations and assumptions. We can’t even evaluate most of these assumptions. Are my data MCAR, MAR, or MNAR? If I think my data is missing not at random, what assumptions do I feel I can reasonably make about the missing mechanism? What if I’m wrong? What are reasonable sensitivity analyses to perform? What auxiliary variables do I need to include in my imputation models? Should they be at the cluster-level or the individual-level? Perhaps most importantly, how do I properly capture clustering using multiple imputation?!!! Maybe I could borrow observed responses from similar participants in the same cluster… But what if my clusters are small? What if my clusters are large? Does my strategy need to change? (SPOILER ALERT: yes)

No wonder multiple imputation was only applied in 15% of studied primary analyses. Not to mention that two-level predictive mean matching, which I personally think is a very elegant and pragmatic approach to imputing ordinal data, was not even implemented in the add-on popular missing data R package “mice” until 2016. If you wanted to use existing tools before that, many would be faced with suboptimal imputation using a continuous model that doesn’t capture the bounded, discrete nuances of your data. Real talk: It’s also possible, particularly in the presence of relatively few missing observations, that the statisticians on some of these cluster trials “not addressing missing data” weighed the risk of bias due to the missing data against their confidence in required assumptions or inputs, or required effort to implement, and decided it was not worthwhile. Practical considerations are important too.

Anyhow, while I do think there is still a long way to go to make missing data methods for cluster trials accessible, the good news is that several, different two-level (level 1: individual; level 2: cluster) multiple imputation methods are now available in the popular and easy-to-use mice and add-on miceadds packages in R. This includes predictive mean matching! Yay! In this blog post, I will demonstrate how to impute missing ordinal data in a cluster randomized trial using two-level predictive mean matching via the mice and miceadds packages so you can avoid contributing to bad report cards! More yay – we love avoiding citation for bad practices!

The gist

What I am not going to do in this blog post is provide a deep dive into missing data foundations like mechanisms etc., or provide an in-depth walk through of how predictive mean matching works. You can find great resources on these topics elsewhere, like here or here. But for context, here is a very quick run-down of key concepts:

Assume Missing at Random (MAR): MAR implies that our missing data depends on other measured variables, like sex or age. We need to include these other variables as covariates in our multiple imputation models.
Multiple Imputation by Chained Equations (MICE): When multiple variables have missing data, we may impute variables one-at-a-time in sequential order from least to most missing data. Variables with missing data are commonly imputed by specifying a regression model with the incomplete variable as the outcome and relevant variables as covariates. Predictions from this model are used to fill in the missing values. For this reason, MICE is also referred to as Fully Conditional Specification (FCS). Use of regression permits imputation of different outcome types, including binary (logistic regression) and count (Poisson regression). This process cycles through all incomplete variables, iterating until imputed variables yield stable summary statistics.
Predictive Mean Matching (PMM): When outcome distributions are complex, e.g., skewed or bounded, predictive mean matching can be used instead of regression within MICE. For each variable with missing data, observed responses are first regressed on covariates. The responses of all participants, with or without missing data, are predicted using this working linear model. For each participant with missing data, K “donors” or participants with complete data are then identified based on similarity of predicted responses. The missing response is then imputed by randomly selecting one of the K donor responses. By imputing missing responses with observed responses, PMM preserves properties of the observed data.

There are many nuances to these concepts, so if you do plan to use this analytic approach, make sure to do your homework. If you are interested in learning more about the nuances of designing and analyzing cluster randomized trials, check out: for a higher level overview, Comprehending complex designs: Cluster randomized trials, or for the nitty gritty mechanics Simulating data from two-arm cluster randomized trials using base R.

The data

We will use data from a real cluster randomized trial for demonstration, downloadable in different formats from the Harvard Dataverse (click “Access File” to download). This dataset corresponds to a two-arm cluster randomized trial that investigated whether an experimental sex education programme, Sexual Health and Relationships: Safe, Happy and Responsible (SHARE), reduced unsafe sex among students in Scotland versus existing sex education (control).

# Load SHARE data, downloaded as .Rdata format from Harvard Dataverse
SHARE <- get(load('share.Rdata'))

Schools were randomized as clusters (school) to deliver the SHARE curriculum (arm==1) or not (arm==0). At baseline, the sex and social class (scpar) of each student was captured. At follow-up, student sexual health knowledge was assessed by a knowledge score (kscore), ranging from -8 (least knowledgeable) to +8 (most knowledgeable). Students were also asked whether they became sexually active during follow-up (debut) and whether they regretted prior sexual activity (regret).

# View variable labels
sapply(SHARE, attr, "comment")

##                                                                           school 
##                                                            "school number: 1-25" 
##                                                                              sex 
##                                                          "sex: 1=male, 2=female" 
##                                                                              arm 
##                                       "treatment arm: 0=control, 1=intervention" 
##                                                                            scpar 
## "social class: 10=I,20=II,31=IIInon-manual,32=IIImanual,40=IV,50=V,99=not coded" 
##                                                                            debut 
##                         "onset of sexual activity during follow-up: 0=no, 1=yes" 
##                                                                           regret 
##  "regret of first sex with last partner (in those with >1 partner): 0=no, 1=yes" 
##                                                                           kscore 
##                   "knowledge of sexual health at follow-up: score from -8 to +8" 
##                                                                             idno 
##                                                              "individual number"

# Preview SHARE data
head(SHARE)

##   school sex arm scpar debut regret kscore idno
## 1      1   2   1    31     1     NA      3    1
## 2      1   2   1    31     1      0      5    2
## 3      1   2   1    99     0     NA      4    3
## 4      1   2   1    20     0     NA      4    4
## 5      1   2   1    20     1     NA      6    5
## 6      1   1   1    20     0     NA      8    6

Data cleaning

We will consider arm as our exposure, kscore (knowledge score) our outcome, and school our cluster. We will also consider sex and scpar (social class) as covariates.

# For dplyr data manipulation and ggplot2 graphics
library(tidyverse)

# Subset to relevant columns
df <- SHARE %>% select(school, arm, sex, scpar, kscore)

One strategy in missing data is to treat these categorical variables as continuous to simplify imputation. However, if their distribution is anything but normal, imputations can be wacky. Sometimes use of transformation (e.g., log) to approximate normality prior to imputation can improve quality, but in my experience, not always. Capturing the unique skew of ordinal variables using parametric models can also prove challenging. Predictive mean matching (PMM) can capture these oddities.

Note: To use school as a cluster identifier in mice, it must be a numeric variable. It is also best to leave ordinal variables as numeric in R.

The skim function from library(skimr) provides a nice overview of all observations in the SHARE dataset. Output confirms that all variables appear as expected. Note that there are 2 arms and 25 schools. It also tells us that out of 5,854 total students (2,867 in the Intervention arm, 2,987 in the Control Arm), 353 students (6%) are missing the outcome kscore, and 134 (2%) students are missing the covariate scpar.

Observed distributions

Examining knowledge score distributions by covariate will provide insights into associations with observed and missing values that will inform our imputation. The following code was repeated for arm and each covariate, to produce the panel below.

# Tabulate proportion of students with each knowledge score by arm
d1_arm <- df %>%
  mutate(arm = factor(arm, levels=0:1, labels=c('Control', 'Intervention'))) %>%
  group_by(arm, kscore) %>%
  summarize(n=n()) %>%
  group_by(arm) %>% 
  mutate(p=n/sum(n)*100)

## Repeat this chunk for sex and scpar

# Plot proportion of students with each knowledge score by arm
p1_arm <- d1_arm %>%
  ggplot(aes(x=kscore)) +
  geom_bar(aes(y=p, col=arm, fill=arm), stat='identity', 
           alpha = 0.75, position=position_dodge(width=0.5)) +
  lims(x=c(-10, 10), y=c(0, 30)) +
  labs(y='Students, %', x='', fill='Arm', col='Arm') +
  scale_x_continuous(labels=NULL) +
  theme_bw() +
  theme(legend.position="inside", legend.position.inside=c(0.1,0.55))

## Repeat this chunk for sex and scpar

gridExtra::grid.arrange(p1_arm, p1_sex, p1_scpar, ncol=1)

The arm plot (top) suggests that knowledge scores tend to be greater in the Intervention arm compared to the Control arm. The proportion of missing scores appears similar between arms.

# Knowledge scores within Intervention arm
summary(df$kscore[df$arm==1])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -6.00    4.00    5.00    4.77    6.00    8.00     178

# Knowledge scores within Control arm
summary(df$kscore[df$arm==0])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -5.000   3.000   4.000   4.155   6.000   8.000     175

The sex plot (middle) suggests that knowledge scores tend to be greater, and slightly less likely to be missing, in Females compared to Males.

# Knowledge scores within Females
summary(df$kscore[df$sex==2])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -5.000   4.000   5.000   4.861   6.000   8.000     159

# Knowledge scores within Males
summary(df$kscore[df$sex==1])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -6.000   2.000   4.000   3.983   6.000   8.000     194

The scpar plot (bottom, right) suggests that knowledge scores tend to be slightly greater, and slightly less likely to be missing, among the upper social classes (i.e., Class I or II).

# Knowledge scores within Social class I or II
summary(df$kscore[df$scpar==10 | df$scpar==20])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -6.000   3.000   5.000   4.678   6.000   8.000     218

# # Knowledge scores within all other Social classes
summary(df$kscore[!(df$scpar==10 | df$scpar==20)])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -4.000   3.000   4.000   4.268   6.000   8.000     371

Missing pattern

To investigate how missing values co-occur, we should look at the missing pattern. This can be done using the md.pattern function in R library(mice). The mice library is used by the excellent textbook, Flexible Imputation of Missing Data by van Buren, and contains both accessible theory and practical tips.

library(mice)

Each row reflects a distinct pattern of observed (blue) and missing (red) values:

Numbers on the left represent the number of students displaying each pattern.
Numbers on the right represent the number of distinct missing variables in each pattern.

# Investigate properties of the missing data
md.pattern(df, rotate.names=F)

##      school arm sex scpar kscore    
## 5399      1   1   1     1      1   0
## 321       1   1   1     1      0   1
## 102       1   1   1     0      1   1
## 32        1   1   1     0      0   2
##           0   0   0   134    353 487

school, arm, and sex are observed for all students (N=5,854).
kscore (n=353), and scpar (n=134) are missing for some students.
The most common pattern (n=321) features missing knowledge score (kscore) only.
n=102 students are missing social class (scpar) only.
n=32 students are missing both kscore and scpar.

In total, 487 (8%) students are missing any variable of interest and 353 (6%) students are missing outcomes.

The imputation

The role of variables in imputation

So, first up. Are we imputing values at the participant-level or summaries at the cluster-level?

We are interested in student-level outcomes like knowledge scores, rather than school-level outcomes like fidelity to the curriculum. In this case, I would impute participant-level values and then aggregate them into cluster-level summaries.

Although we are imputing individual-level values, cluster-level summaries are still useful for informing cluster differences in imputation models! More on this shortly.

Use the make.predictorMatrix function to initialize the form of your imputation models.

Rows represent dependent variables, or the variable to be imputed.
Columns represent independent variables, or predictors of the imputed variable.
By default, each variable predicts every other variable (off-diagonal=1).
A variable cannot predict itself (diagonal=0).

pred <- make.predictorMatrix(df)
pred

##        school arm sex scpar kscore
## school      0   1   1     1      1
## arm         1   0   1     1      1
## sex         1   1   0     1      1
## scpar       1   1   1     0      1
## kscore      1   1   1     1      0

Set the predictor value of school to -2 to inform imputation models that school represents our cluster.

A predictor matrix cell equal to -2 indicates that the column variable (i.e., school) represents the cluster for the row variable (i.e., kscore).

pred['kscore','school'] <- -2

pred

##        school arm sex scpar kscore
## school      0   1   1     1      1
## arm         1   0   1     1      1
## sex         1   1   0     1      1
## scpar       1   1   1     0      1
## kscore     -2   1   1     1      0

For good measure, we will set school as the cluster for all variables.

pred[,'school'] <- -2

pred

##        school arm sex scpar kscore
## school     -2   1   1     1      1
## arm        -2   0   1     1      1
## sex        -2   1   0     1      1
## scpar      -2   1   1     0      1
## kscore     -2   1   1     1      0

Specifying a predictor value of 3 includes the variable and its cluster mean as predictors in the imputation model.

Cluster-level summaries can still be useful as covariates in imputation models to better inform differences between clusters.
When predictors are categorical, the cluster-level mean may not be meaningful. But, this doesn’t matter for prediction! The cluster-level mean acts as a cluster signature of sorts.

pred[,'arm']    <- 3
pred[,'sex']    <- 3
pred[,'scpar']  <- 3
pred[,'kscore'] <- 3

Specifying a predictor value of 0 excludes the variable from the imputation model.

school, arm, and sex are completely observed, so it does not matter what their predictors are – they won’t be imputed. But, for good measure, we will specify no predictors for these variables.

pred[c('school', 'arm', 'sex'),] <- 0

Even though our data set is relatively small, imputation can still take a while. Simplifying your models as much as possible can speed things up. Of course, we know arm is not predictive of baseline covariate social class scpar due to randomization. We can also update our matrix to reflect that:

pred['scpar', 'arm'] <- 0

Review your final predictor matrix, which defines your imputation models.

Within each cluster school, our final predictor matrix outlines the following two conditional relationships…
- kscore ~ arm + sex + scpar
- scpar ~ sex + kscore

pred

##        school arm sex scpar kscore
## school      0   0   0     0      0
## arm         0   0   0     0      0
## sex         0   0   0     0      0
## scpar      -2   0   3     3      3
## kscore     -2   3   3     3      3

PMM: Our method of choice

The mice library is flexible regarding methods used to impute each variable, here scpar and kscore. This is great for a mix of variable types. As discussed above, you can tailor your imputation regression models to your outcome type when using MICE.

The make.method function attempts to predict which method is best for each variable.

There is no need to specify a method for school, arm, or sex because they are fully observed.
Predictive mean matching (pmm) is suggested for scpar and kscore – excellent!

method <- make.method(df)
method

## school    arm    sex  scpar kscore 
##     ""     ""     ""  "pmm"  "pmm"

But wait – we can’t use just regular PMM! To incorporate clustering, we need to use two-level PMM! That doesn’t come with the standard plan. You’re going to need the simply everything plan. That is, the miceadds library.

# note: the miceadds library is the simply everything everything plan
library(miceadds)

By setting the method for scpar and kscore to 2l.pmm rather than pmm, mice now knows that it should pay attention to our cluster variable.

method[c('scpar', 'kscore')] <- '2l.pmm'
method

##   school      arm      sex    scpar   kscore 
##       ""       ""       "" "2l.pmm" "2l.pmm"

Are we there yet?

We’ve done all the hard work. Just kidding – sorry! You now have to decide how many imputations you want to perform, i.e., how many data sets you want to construct. This is multiple imputation after all.

There is a random element to PMM: For each person with missing data, we borrow one observation from multiple candidates. If we perform multiple imputations:

Sometimes person 2 will be chosen to inform person 1.
Sometimes person 74 will be chosen to inform perform person 1.
Sometimes person 42 will be chosen to inform person 1.
…and so on…

By imputing multiple times, we can actually measure the extra variability or uncertainty in our estimates that due to THE CHAOSSSSSS. By averaging estimates across multiple imputations we also get a better guesstimate.

Basically inference relies on the mean of multiple estimates from multiple imputations. According to ye olde central limit theorem, this suggests we want at least 30 imputations.

# Number of imputations
m <- 30

In my opinion, the more imputations the better. But imputation can take a long time to run, and how long depends on how many imputations and iterations per imputation you perform. (Psst: consider running mice overnight or using a computing cluster). So, we also need to figure out how many iterations to perform for each imputation before our computer explodes.

What exactly does “iteration” mean here? mice is a cyclical algorithm: It starts with a mediocre guess and improves it calibration at each step or iteration.

At the start of the first iteration, simple imputation (like mean or mode imputation) is used to fill in all missing values.
Completely observed predictors (e.g., sex or arm) and newly filled-in variables (e.g., kscore) are then used to predict the variable with the least missing data (e.g., scpar – which is also now filled-in per Step 1).
Completely observed variables and the newly-filled in variables (e.g., scpar) are then used to to predict the next variable (e.g., kscore).
This proceeds until you’ve predicted, or filled-in, all variables! This marks the end of the first iteration.
On the next iteration, Steps 2 through 4 are repeated starting with the filled-in data from the last iteration.

For our data, this is what 5 iterations look like.

# p.s. make sure to set your seed always
mice(data=df, m=1, maxit=5, method=method, predictorMatrix=pred, seed=1234)

## 
##  iter imp variable
##   1   1  scpar  kscore
##   2   1  scpar  kscore
##   3   1  scpar  kscore
##   4   1  scpar  kscore
##   5   1  scpar  kscore

## Class: mids
## Number of multiple imputations:  1 
## Imputation methods:
##   school      arm      sex    scpar   kscore 
##       ""       ""       "" "2l.pmm" "2l.pmm" 
## PredictorMatrix:
##        school arm sex scpar kscore
## school      0   0   0     0      0
## arm         0   0   0     0      0
## sex         0   0   0     0      0
## scpar      -2   0   3     0      3
## kscore     -2   3   3     3      0

If things go well, at some point, the imputation models and subsequently the predictions won’t change much. This is referred to as convergence.

How many iterations are needed to reach convergence? This is more of an art than a science. I’ll start with, say, 200 iterations, and check runtime and corresponding diagnostics. Here, 200 iterations took almost 16 seconds – with 30 imputations, mice should take about 8 minutes to run.

system.time({
  # NOTE: print=F prevents print of each iteration
  imp_test <- mice(data=df, m=1, maxit=200, method=method, predictorMatrix=pred, print=F, seed=1234)
})

##    user  system elapsed 
##  16.040   0.327  17.063

If your imputations have converged, you shouldn’t see any trends in the summary statistics of your imputed variables. You want to see CHAOSSSSSS.
Looks chaotic to me, and it didn’t take very long – 200 iterations it is.

plot(imp_test)

If you feel that your run time is too long and your diagnostics appear reasonable after fewer iterations, you can adjust the number of iterations downwards.

MICE with PMM: The main event

So far, we have identified and specified:

Predictors of our missing data (pred),
Multiple imputation methods (method),
Number of imputations (m), and
Number of iterations (maxit).

We can finally put together all of the pieces, and run mice for realzies!!!

imp <- mice(data=df, m=m, maxit=200, method=method, predictorMatrix=pred, print=F, seed=1234)

If we look at our convergence diagnostics, we now have a whole lotta lines! And they look chaotic – excellent!

plot(imp)

We can also visualize the distributions of our 30 imputations. The blue line represents the observed density. Because knowledge score is categorical, we see spikes at each integer value. Each red line represents one imputed density. The imputed densities feature the same trends as the observed data, but appear to suggest imputed knowledge scores tend to be lower than observed. This makes sense based on observed relationships between knowledge score and our predictors.

densityplot(imp, ~kscore, thickness=5, alpha=0.5)

densityplot(imp, ~scpar, thickness=5, alpha=0.5)

Save your imputed datasets

Now that we have confirmed that our multiple imputations look good, we should save them for later use and traceability.

First, we should save the mice object in RDS format for later processing using R.

saveRDS(imp, file="imp.rds")

Second, we should extract and save the imputed datasets. The following provides the imputed datasets in dataframe format.

imp_df <- complete(imp, "long")
head(imp_df)

##   school arm sex scpar kscore .imp .id
## 1      1   1   2    31      3    1   1
## 2      1   1   2    31      5    1   2
## 3      1   1   2    99      4    1   3
## 4      1   1   2    20      4    1   4
## 5      1   1   2    20      6    1   5
## 6      1   1   1    20      8    1   6

The .imp variable indexes the imputed dataset:

# We have 30 imputed datasets with 5854 observations (rows) each
table(imp_df$.imp)

## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17 
## 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 
##   18   19   20   21   22   23   24   25   26   27   28   29   30 
## 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854 5854

We will save this data frame in CSV (comma separated value) format.

write.csv(imp_df, "imp_df.csv")

Grand finale: Treatment effect estimation

We now need to estimate the treatment effect within each of our m=30 imputed datasets. Because our data is clustered, we will use a linear mixed model to do this.

Side note: I recognize that up until this point we have treated kscore as an ordinal outcome, so the use of a linear mixed model now feels inappropriate! I agree. You can check out my previous blogpost, Minor musings on measurement, to learn more about the properties of ordinal data, and debates on their handling in analysis.

For simplicity, and to demonstrate current limitations of clustered imputation, we will assume in what follows that knowledge score can be analyzed as a continuous variable. There are some options to pool imputed multi-level ordinal logistic models, but I won’t be covering them here – in part because I am anti-ordinal logistic models. You should check out this super cool, really great alternative method using rank-based estimation in cluster randomized trials. coughcough No bias here! coughcough

To do so, in conjunction with the miceadds library, we need the nlme library to fit our linear mixed models and the broom.mixed library to help pool our analyses. The parameters library is optional and provides a nice view of our model estimates.

library(nlme)           # necessary to fit    lmm
library(broom.mixed)    # necessary to pool   lmm
library(parameters)     # optional  to report lmm

Complete-case treatment effect estimate

For comparison, we begin by estimating the treatment effect among those with observed knowledge scores only.

obs_df  <- df[!is.na(df$kscore),]
obs_lme <- with(obs_df, lme(kscore ~ arm, random=~1|school))

Results suggest that knowledge scores are 0.54 points greater among students in the treatment arm compared to the control arm, on average.

model_parameters(obs_lme)

## # Fixed Effects
## 
## Parameter   | Coefficient |   SE |       95% CI |     t |   df |      p
## -----------------------------------------------------------------------
## (Intercept) |        4.12 | 0.12 | [3.88, 4.36] | 33.72 | 5476 | < .001
## arm         |        0.54 | 0.17 | [0.18, 0.89] |  3.15 |   23 | 0.005 
## 
## # Random Effects
## 
## Parameter              | Coefficient
## ------------------------------------
## SD (Intercept: school) |        0.39
## SD (Residual)          |        2.32

We also receive estimates of our variance components, suggesting an intracluster correlation of 0.027.

Imputed treatment effect estimates

With the miceadds library loaded, we can easily analyze each of our m=30 imputed datasets using the with statement.

# Analyze each imputed dataset
imp_lme <- with(imp, lme(kscore ~ arm, random=~1|school))

This analyzes each imputation separately and provides the corresponding results in a list.

# Analysis of first imputed dataset
imp_lme$analyses[[1]]

## Linear mixed-effects model fit by REML
##   Data: NULL 
##   Log-restricted-likelihood: -13276.68
##   Fixed: kscore ~ arm 
## (Intercept)         arm 
##   4.1236628   0.5197622 
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept) Residual
## StdDev:   0.3874751 2.327008
## 
## Number of Observations: 5854
## Number of Groups: 25

# Analysis of 30th imputed dataset
imp_lme$analyses[[30]]

## Linear mixed-effects model fit by REML
##   Data: NULL 
##   Log-restricted-likelihood: -13276.6
##   Fixed: kscore ~ arm 
## (Intercept)         arm 
##   4.1221611   0.5272727 
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept) Residual
## StdDev:   0.3846476 2.327031
## 
## Number of Observations: 5854
## Number of Groups: 25

We see very slight variation in estimates between the 1st and 30th imputed dataset.

Pooled treatment effect estimate

To obtain a single or pooled treatment effect estimate, we average the treatment effect estimates across the m=30 imputations and use Rubin’s Rules to estimate the pooled standard error.

This is accomplished through simple use of the pool function.

pool_lme <- pool(imp_lme)

The pooled treatment effect estimate appears almost identical to the complete-case estimate. This suggests imputation really didn’t do much here in terms of mitigating bias. However, our standard errors and p-values are smaller here, suggesting that imputation provided slight gains in efficiency by salvaging missing observations.

model_parameters(pool_lme)

## # Fixed Effects
## 
## Parameter   | Coefficient |   SE |       95% CI |     t |      df |      p
## --------------------------------------------------------------------------
## (Intercept) |        4.11 | 0.12 | [3.87, 4.35] | 33.13 | 5678.35 | < .001
## arm         |        0.54 | 0.17 | [0.20, 0.88] |  3.11 | 5702.46 | 0.002

As a result of pooling, notice that the degrees of freedom here are much different than the complete-case degrees of freedom. Also notice – no estimates of the variance components! The pool function does not provide these at present.

Concluding thoughts

Imputing clustered data is challenging, particularly when multiple variables with missing data differ in their type or follow non-standard distributions. Multiple imputation by chained equations (MICE) with predictive mean matching (PMM) provides an elegant solution, but was not implemented for clustered data in the popular mice (+ miceadds) libraries until 2016.

Here, we have demonstrated how to implement this multiple imputation pipeline for clustered data from start to end using R. Even for our simple example, significant effort was needed to set up MICE with PMM in a two-level setting. The value of imputation appeared limited in terms of bias, with limited difference in complete-case and pooled treatment effect estimates. While imputation did decrease the standard error slightly, the complete-case effect estimate was already statistically significant (p=0.005) – again suggesting limited value.

Although there is still much work to be done on imputation in clustered scenarios, like providing estimates of pooled variance components, current efforts should be applauded. I am looking forward to seeing how we as a statistical community can improve the usability of these sophisticated tools. I also want to emphasize that although I do not feel that imputation is always necessary, statisticians should continue to think carefully about the impacts on missing data in their project. In this case, imputation didn’t do much for us – but we are basically never privilege to the true missing mechanism.

Generally speaking, better safe than sorry.