Simulating data from two-arm cluster randomized trials (CRTs) and partially-nested individually randomized group treatment trials (IRGTs) using base R

Overview

In a previous blogpost, Comprehending complex designs: Cluster randomized trials, I walked through the nuances and challenges of cluster randomized trials (CRTs). Cluster randomized trials randomize groups of individuals, such as families or clinics, rather than individuals themselves. Cluster randomized trials are used for a variety of reasons, including evaluating the spread of infectious disease within a household or evaluating whether a new intervention is effective or feasible in real-world settings. Participants within the same cluster may share the same environment or care provider, for example, leading to correlated responses. If this intracluster correlation is not accounted for, variances will be underestimated and inference methods will not have the operating characteristics (i.e., type I error) we expect. Linear mixed models represent one approach for obtaining cluster-adjusted estimates, and their application was demonstrated using data from the SHARE cluster trial evaluating different sex ed curriculums (interventions) in schools (clusters).

Individually randomized group treatment trials (IRGTs) are closely related to CRTs, but can require slightly more complex analytic strategies. IRGT designs arise naturally when individuals do not initially belong to a group or cluster, but are individually randomized to receive a group-based intervention or receive treatment through a shared agent. As a result, individuals are independent at baseline, but intracluster correlation can increase with follow-up as individuals interact within their respective group or with their shared agent. IRGTs can be “fully-nested,” meaning that both the control and experimental conditions feature a group-based intervention, or “partially-nested,” meaning that the experimental condition is group-based while the control arm is not. A fully-nested IRGT may be used to compare structured group therapy versus group discussion for mental health outcomes, for example. If both arms feature groups and the same intracluster correlation, analysis of fully-nested IRGTs is practically identical to that of CRTs. In comparison, a partially-nested IRGT may be used to compare group therapy versus individual standard of care or a waitlist control, for example. Analysis of partially-nested IRGTs is more complex because intracluster correlation is only present in one arm, and methods must be adapted to handle heterogeneous covariance or correlation matrices. If fully-nested but arms do not share the same correlation, similar considerations are required.

To provide insight into data generating mechanisms and inference, this blog post demonstrates how to simulate normally distributed outcomes from (1) a two-arm cluster randomized trial and (2) a two-arm, partially-nested individually randomized group treatment trial. I only use base R for data generation, so these approaches can be widely implemented. Simulation of complex trial designs is helpful for sample size calculation and understanding operating characteristics of inference methods in different scenarios, such as small samples. Analysis of the simulated data proceeds using linear mixed models fit by the nlme library. Visualization uses ggplot2.

Continue reading Simulating data from two-arm cluster randomized trials (CRTs) and partially-nested individually randomized
group treatment trials (IRGTs) using base R

Practical inference for win measures via U-statistic decomposition

Introduction

In a previous blogpost, I described how complex estimation of U-statistic variance can be simplified using a “structural component” approach introduced by Sen (1960). The structural component approach is very similar to the leave-one-out jackknife. Essentially, the idea behind both of these approaches is that we decompose the statistic into individual contributions. Here, these are referred to as “structural components,” and in the LOO jackknife, these are referred to as “pseudo-values” or sometimes “pseudo-observations.” Construction of these individual quantities differs conceptually somewhat, but in another blogpost, I discuss their one-to-one relationship for specific cases. We can then take the sample variance of these individual contributions to estimate the variance of the statistic.

Estimators for increasingly popular win measures, including the win probability, net benefit, win odds, and win ratio, are obtained using large-sample two-sample U-statistic theory. Variance estimators are complex for these measures, requiring the calculation of multiple joint probabilities.

Here, I demonstrate how variance estimation for win measures can be practically estimated in two-arm randomized trials using a structural component approach. Results and estimators are provided for the win probability, the net benefit, and the win odds. For simplicity, only a single outcome is considered. However, extension to hierarchical composite outcomes is immediate with use of an appropriate kernel function.

Continue reading Practical inference for win measures via U-statistic decomposition

Comprehending complex designs: Cluster randomized trials

Well, well, well…

Look who decided to show up – the end of the year. Entirely unexpected and completely unforeseeable, bringing with it a strange but satisfying mix of chaos and comfort. Somehow we continue to find ourselves in this showdown again and again – 2 blog posts down, 1 to go.

Looking at date stamps, at least I'm late right on time.

Looking at date stamps, at least I’m late right on time.

As in previous years, this year’s final blog post is brought to you by the mantra: “Write what you know.” My research interests revolve around correlated data and randomized controlled trials. This includes handling of multiple primary endpoints or longitudinal endpoints in randomized controlled trials (RCTs), in particular their implications in the design and analysis of cluster randomized trials.

🌈 ⭐ 🌈 ⭐ 🌈 ⭐

On this episode of Statisticelle, we will explore the nuances of cluster randomized trials featuring a single endpoint. Freely accessible data from a real cluster randomized trial will be used to demonstrate the analysis of individual-level outcomes using linear mixed models and the nlme library in R.

🌈 ⭐ 🌈 ⭐ 🌈 ⭐

Continue reading Comprehending complex designs: Cluster randomized trials

EM Algorithm Essentials: Estimating standard errors using the empirical information matrix

Introduction

At the end of my latest blog post, I promised that I would talk about how to perform constrained maximization using unconstrained optimizers. This can be accomplished by employing clever transformation and nice properties of maximum likelihood estimators – see this fantastic post from the now defunct 🙁 Econometrics Beat. Eventually I will get around to discussing more of the statistical details behind this approach and its implementation from scratch in R.

RIGHT NOW, I want to talk about how to obtain standard errors for Gaussian mixture model parameters estimated using the EM algorithm. So far, I’ve written two posts with respect to Gaussian mixtures and the EM algorithm:

  1. Embracing the EM algorithm: One continuous response, which motivates the theory behind the EM algorithm using a two component Gaussian mixture model; and
  2. EM Algorithm Essentials: Maximizing objective functions using R’s optim, which demonstrates how to implement log-likelihood maximization from scratch using the optim function.

Both posts so far have focused only on POINT ESTIMATION! That is, obtaining estimates of our mixture model parameters.

Any estimate we obtain using any statistical method has some uncertainty associated with it. We quantify the uncertainty of a parameter estimate by its STANDARD ERROR. If we repeated the same experiment with comparable samples a large number of times, the standard error would reflect how much our estimates differ from experiment to experiment. Indeed, we would estimate the standard error as the standard deviation of the effect estimates across experiments. If we can understand how our estimates vary across experiments, i.e., estimate their standard errors, we can perform statistical inference by testing hypotheses or constructing confidence intervals, for example.

We usually do not need to conduct all of these experiments because theory tells us what form this distribution of effect estimates takes! We refer to this distribution as the SAMPLING DISTRIBUTION. When the form of the sampling distribution is not obvious, we may use resampling techniques to estimate standard errors. The EM algorithm, however, attempts to obtain maximum likelihood estimates (MLEs) which have theoretical sampling distributions. For well-behaved densities, MLEs can be shown to be asymptotically normally distributed and unbiased with a variance-covariance matrix equal to the inverse of the Fisher Information matrix, i.e., a function of the second derivatives of the log-likelihood or equivalently, the square of the first derivatives…

The tricky bit with the EM algorithm is we don’t maximize the observed log-likelihood directly to estimate Gaussian mixture parameters. Instead, we maximize a more convenient objective function. People much smarter than I have figured out this can give you the same answer. A question remains: if standard errors are estimated using the second derivative of the log-likelihood, but we used the objective function, how do we properly quantify uncertainty? Particularly as the derivatives of the log-likelihood are not straight-forward, for the same reasons that make optimization difficult.

Isaac Meilijson, another person smarter than me, provides us with a wealth of guidance in his 1989 paper titled A Fast Improvement to the EM Algorithm on its Own Terms (I love this title). In this blog post, we summarize and demonstrate how to construct simple estimators of the observed information matrix, and subsequently the standard errors of our EM estimates, when we have independent and identically distributed responses arising from a two-component Gaussian mixture model. Essentially, due to some nice properties, we can simply use the derivatives of our objective function, in lieu of our log-likelihood, to estimate the observed information. Standard errors computed using the observed information are then compared to those obtained using numerical differentiation of the observed log-likelihood.

Continue reading EM Algorithm Essentials: Estimating standard errors using the empirical information matrix