Analyze and visualize DAGs in R by integrating the dagR, daggity, and ggdag libraries

Huzzah!!

This blog post makes 3/3 for the year. I made it, folks! I wasn’t sure I would. It was a jam-packed year for us: I turned 30 and finished my Ph.D. at Western (not causal – well, maybe), we sold our first house and moved from Canada to the USA, and I promptly started my postdoc at Harvard. But it was a good year for us, too. We are having fun exploring Boston, I’m enjoying my postdoc work and co-workers, and I can now force my family to address all my Christmas presents to Dr. Emma – worth it. Very much looking forward to what the New Year brings.



Here’s looking at you, New Year! (Left: Dr. Emma, Right: Just Ethan)

D’ya like DAGs?

If so, good news! This blogpost is all about DAGs. Directed acyclic graphs, that is. One of the most popular posts on my site is Using a DAG to simulate data with the dagR library. I think this is really interesting. Generation of data from randomized controlled trials is relatively straightforward, but I imagine this is not the case for observational studies, confounding and all. DAGs represent a way to formalize our assumptions around causal mechanisms, i.e., X causes Y, and this relationship is confounded by Z, for example. As the adoption of causal inference grows (look at any statistical conference programme), it makes sense that we would want to assess novel methods using data generated according to realistic mechanisms. Simulation according to DAGs allows us to do this, or at least attempt to.

There are two things I want to comment on regarding my previous blogpost. First, I’d really like to see some more sophisticated ways of generating data according to DAGs. The dagR library is limited to continuous and binary exposures, which may be good enough for now but won’t be enough in the long run (survival endpoints are very common in medicine, so too are ordinal exposures or outcomes, etc). I find that the way direct effects are specified and defined also makes it challenging to know the “ground truth” in some cases. I’d like to see the packages do more heavy lifting.

I’m starting to see more materials around DAG simulation, like the DagSim framework for Python or papers like Illustrating How to Simulate Data From Directed Acyclic Graphs to Understand Epidemiologic Concepts and DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation. There might be some promising nuggets in here already, and movement in this area suggests we should see something promising soon. Also, thank you to the people working on this!

Second, the default diagrams produced by dagR aren’t bad, but they aren’t great. Sorry (see also: thank you above). Writing this blog post, I also noticed that an article on the use of dagR has also been published by the package authors in 2022. Funny enough, they seem to recognize it too – suggesting that the ggdag library in R can be used to improve the graphical layout of DAGs. Hey, wouldn’t you know it – that’s exactly what this blog post is about!! A great feature of the dagR library is that you can convert a dagR object to dagitty object with a single command. So, let’s make your already programmed DAG work for you!

TL;DR

This blog post is a quick tutorial on how to transform dagR objects into dagitty objects so you can construct publication-worthy visualizations of your DAG using the ggdag library.

library(dagR)
library(dagitty)
library(ggdag)
library(tidyverse)

Continue reading Analyze and visualize DAGs in R by integrating the dagR, daggity, and ggdag libraries

Using a DAG to simulate data with the dagR library

Directed acyclic graphs (DAGs), and causal graphs in general, provide a framework for making assumptions explicit and identifying confounders or mediators of the relationship between the exposure of interest and outcome that need to be adjusted for in analysis. Recently, I ran into the need to generate data from a DAG for a paper I am writing with my peers Kevin McIntyre and Joshua Wiener. After a quick Google search, I was pleasantly surprised to see there were several options to do so. In particular, the dagR library provides “functions to draw, manipulate, [and] evaluate directed acyclic graphs and simulate corresponding data”.

Besides dagR‘s reference manual, a short letter published in Epidemiology, and a limited collection of examples, I couldn’t find too many resources regarding how to use the functionality provided by dagR. The goal of this blog post is to provide an expository example of how to create a DAG and generate data from it using the dagR library.

To simulate data from a DAG with dagR, we need to:

  1. Create the DAG of interest using the dag.init function by specifying its nodes (exposure, outcome, and covariates) and their directed arcs (directed arrows to/from nodes).
  2. Pass the DAG from (1) to the dag.sim function and specify the number of observations to be generated, arc coefficients, node types (binary or continuous), and parameters of the node distributions (Normal or Bernoulli).

For this tutorial, we are going to try to replicate the simple confounding/common cause DAG presented in Figure 1b as well as the more complex DAG in Figure 2a of Shier and Platt’s (2008) paper, Reducing bias through directed acyclic graphs.

library(dagR)
set.seed(12345)

Continue reading Using a DAG to simulate data with the dagR library