Overview

In my previous blogpost, I motivated the EM algorithm in the context of estimating the parameters of a two-component Gaussian mixture density. In this case, we can write the estimators of the mixing probability, means, and variance in a nice closed form, and I demonstrated how to implement the corresponding iterative estimation procedure from scratch. Results were then compared to those obtained from the very nice R package flexmix.

However, rarely do we get such nice closed form estimators! We usually need to use numerical methods to maximize our objective function directly. In this blog post, I demonstrate how we can specify our objective function, and use the optim function in R to obtain our parameter estimates. optim has lots of options, and we will cover how to change the optimization procedure and implement restrictions on our parameter spaces.

EM for two-component Gaussian mixture

Let’s quickly recap our motivation, previously discussed in Embracing the EM algorithm: One continuous response.

We randomly sample $N$ patients from a population, and examine the empirical density of their responses. We notice two modes, and based on prior knowledge, hypothesize that the density is actually a mixture of two Gaussian densities. For example, the density centered around greater responses may correspond to “healthy” patients and the other to “ill” patients. We would like to (1) estimate the probability of belonging to each subpopulation; and (2) estimate subpopulation parameters, e.g. mean response in among the healthy and ill. But, we have a problem: we don’t know who belongs to each subpopulation. In other words, subpopulation labels are unobserved or “latent.”

Figure: Observed two-component Gaussian mixture density (purple), and distribution of responses among latent healthy (blue) and ill (red) patient subpopulations.

We can represent the density of the observed responses as a mixture of the subpopulation densities:

$f(y) = \pi~ f_1(y) + (1-\pi)~ f_2(y).$

That is, individuals belong to the first subpopulation, or are distributed according to density $f_{1}(y)$ , according to probability $\pi$ and to the second, distributed per $f_2(y)$ , with probability $1-\pi$ . We assume both densities are Gaussian with respective means $\mu_1$ and $\mu_2$ and variances $\sigma_1^2$ and $\sigma_2^2$ .

Continue reading EM Algorithm Essentials: Maximizing objective functions using R’s optim

Overview

I’m currently working on a project that revolves around the EM algorithm, and am finally realizing the power of this machinery. It really is like that movie with Jim Carrey where he can’t stop seeing the number 23 everywhere, except for me it’s the EM algorithm. Apparently this is called THE BAADER-MEINHOF PHENOMENON, oooh that’s fancy. You’ve probably seen the EM algorithm around too – though perhaps you didn’t know it. It’s commonly used for estimation with missing data. A modified EM algorithm (EMis) is used by the Amelia library in R. The EM algorithm also underpins latent variable models, which makes sense because latent variables are really missing observations when you think about it, right?! The more I learn about statistics, the more I realize most things are really missing data problems… cough potential outcomes cough

Anyways, I was previously taught the EM algorithm using the classic multinomial example. This is a great teaching tool, but I’ve never run into a situation like this in my life (yet). But, I do run into mixture distributions a surprising amount – mostly when investigating heterogeneity within patient populations. There’s a whole textbook on this, see: Medical Applications of Finite Mixture Models. The EM algorithm makes a lot more sense to me in the context of mixture models:

We sample a group of patients and observe their response.

We notice a bimodal structure in the response distribution.

We hypothesize the observed distribution actually corresponds to two subpopulations or “classes.”

We don’t know who belongs to which subpopulation.

We estimate the probability of latent class membership using the EM algorithm.

In this blog post, I motivate the EM algorithm in the context of a two-component Gaussian mixture model. A thorough walkthrough of the underlying theory is provided. In this case, estimators take a nice closed form, but this is rarely the case for complex problems encountered in practice. R code for implementating the EM algorithm using the closed form estimators is provided. I also demonstrate how this model can be easily fit using the flexmix library.

Figure: A two-component Gaussian mixture density.