How can observational data be used to improve the design and analysis of randomized controlled trials (RCTs)? We first consider how to develop estimators to merge causal effect estimates obtained from observational and experimental datasets, when the two data sources measure the same treatment. To do so, we extend results from the Stein shrinkage literature. We propose a generic “recipe” for deriving shrinkage estimators, making use of a generalized unbiased risk estimate. Using this procedure, we develop two new estimators and prove finite sample conditions under which they have lower risk than an estimator using only experimental data. Next, we consider how these estimators might contribute to more efficient designs for prospective randomized trials. We show that the risk of a shrinkage estimator can be computed efficiently via numerical integration. We then propose algorithms for determining the experimental design — that is, the best allocation of units to strata — by optimizing over this computable shrinker risk.

Paper Links: Inference and Design

Bio: Evan is a postdoctoral fellow at the Harvard Data Science Initiative, where he is affiliated with the Department of Statistics and the Institute for Quantitative Social Sciences. His research focuses on problems at the interface of statistics, public health, and social science. His primary methodological statistics work is in causal inference, centering on questions of causal “data fusion,” in which observational and experimental data sources are merged. He also works on problems in political methodology, including voter score calibration, ecological inference, and race imputation. Lastly, Evan conducts applied research on gender-based violence prevention.

Evan earned his PhD in Statistics at Stanford University. His dissertation was advised by Art Owen and Mike Baiocchi, and his postdoc advisers are Luke Miratrix and Kosuke Imai. Between his PhD and postdoc, Evan worked as a Data Scientist on the Biden for President campaign, where he built the models used to predict individual-level voter turnout in battleground states.