A Study of a two-step Estimation Approach for two-phase Studies
ALEX WONG – THE HONG KONG POLYTECHNIC UNIVERSITY
ABSTRACT
Two-phase sampling is commonly adopted for reducing cost and improving estimation efficiency. In many two-phase studies, the outcome and some inexpensive covariates are observed for a large sample in Phase I, and expensive covariates are obtained for a selected subset of the sample in Phase II. As a result, the analysis of the association between the outcome and covariates faces a missing data problem. Complete-case analysis, which relies solely on the Phase II sample, is generally inefficient. In this presentation, we explore a two-step estimation approach, which first obtains an estimator using the complete data, and then updates it using an asymptotically mean-zero estimator obtained from a working model between the outcome and inexpensive covariates using the full data. This two-step estimator is asymptotically at least as efficient as the complete-data estimator and is robust to misspecification of the working model. We study the application of this approach to high-dimensional regression and semiparametric survival modeling contexts. Also, we propose methods to improve the efficiency of existing two-step approaches. We demonstrate the advantages of the proposed methods through simulation studies and provide an application to a major cancer genomics study.