Directed acyclic graphs (DAGs), and causal graphs in general, provide a framework for making assumptions explicit and identifying confounders or mediators of the relationship between the exposure of interest and outcome that need to be adjusted for in analysis. Recently, I ran into the need to generate data from a DAG for a paper I am writing with my peers Kevin McIntyre and Joshua Wiener. After a quick Google search, I was pleasantly surprised to see there were several options to do so. In particular, the dagR library provides “functions to draw, manipulate, [and] evaluate directed acyclic graphs and simulate corresponding data”.
dagR‘s reference manual, a short letter published in Epidemiology, and a limited collection of examples, I couldn’t find too many resources regarding how to use the functionality provided by
dagR. The goal of this blog post is to provide an expository example of how to create a DAG and generate data from it using the
To simulate data from a DAG with
dagR, we need to:
- Create the DAG of interest using the
dag.initfunction by specifying its nodes (exposure, outcome, and covariates) and their directed arcs (directed arrows to/from nodes).
- Pass the DAG from (1) to the
dag.simfunction and specify the number of observations to be generated, arc coefficients, node types (binary or continuous), and parameters of the node distributions (Normal or Bernoulli).
For this tutorial, we are going to try to replicate the simple confounding/common cause DAG presented in Figure 1b as well as the more complex DAG in Figure 2a of Shier and Platt’s (2008) paper, Reducing bias through directed acyclic graphs.