Models for Imputing Missing Data, Including Methods for Assessing Sensitivity of Conclusions to Them

DONALD B. RUBIN

Abstract

There are two relatively standard approaches for imputing missing data, one based on “selection” models and one based on “pattern-mixture” models.  The former focuses on formulating a model for the complete data, that is the combined missing and observed data, and then effectively imputing the missing data so that when combined with the observed data, the result looks like it could have arisen from the formulated complete-data model.  In contrast, the latter effectively fits a different model for each pattern of missing and observed data thereby directly revealing sensitivity of conclusions to assumptions about distributions for which there are no actual observed data available to aid estimation.  A third class of models, which has remained mostly recondite, is based on “pigs” (Potentially Incompatible GibbS) factorizations; although in general such models are mathematically unappealing, they have enjoyed some success in applications because of their flexible implementation in computer software for multiple imputation, such as SAS, STATA, IVEware, SOLAS, and MICE.  The consideration of sensitivity of conclusions to assumptions unassailable by observed data, either explicit as with pattern-mixture models or implicit as with selection models, is a critical ingredient of satisfactory analyses of incomplete data sets, especially those used to make important policy decisions.  Graphical displays, such as “enhanced tipping point” ones, implemented using modern computing environments, are a major component of such sensitivity analyses.  This presentation will address both issues in the context of modeling possibly non-ignorable missing data.