407 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: Semiparametric theory, nonparametric statistics, causal inference, missing data, and epidemiologic methods.
Ph.D., 2006, Harvard University
B.S., 1999, Yale University
My primary area of interest is in semi-parametric efficiency theory with application to causal inference, missing data problems, statistical genetics and mixed model theory. In general, I work on the development of statistical and epidemiologic methods that make efficient use of the information in data collected by scientific investigators, while avoiding unnecessary assumptions about the underlying data generating mechanism.
Yonghoon Lee, Eric Tchetgen Tchetgen, Edgar Dobriban Batch Predictive Inference.
Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score ε-Discretization.
Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift.
Hongxiang Qiu, Xu Shi, Wang Miao, Edgar Dobriban, Eric Tchetgen Tchetgen, Doubly Robust Proximal Synthetic Controls.
Description: https://arxiv.org/abs/2210.02014
Hongxiang Qiu, Edgar Dobriban, Eric Tchetgen Tchetgen (Draft), Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift.
Abstract: Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious challenge and has yet to be solved. In the framework of semiparametric statistics, we can view the covariate shift as a nuisance parameter. In this paper, we propose a novel flexible distribution-free method, PredSet-1Step, to construct prediction sets that can efficiently adapt to unknown covariate shift. PredSet-1Step relies on a one-step correction of the plug-in estimator of coverage error. We theoretically show that our methods are asymptotically probably approximately correct (PAC), having low coverage error with high confidence for large samples. PredSet-1Step may also be used to construct asymptotically risk-controlling prediction sets. We illustrate that our method has good coverage in a number of experiments and by analyzing a data set concerning HIV risk prediction in a South African cohort study. In experiments without covariate shift, PredSet-1Step performs similarly to inductive conformal prediction, which has finite-sample PAC properties. Thus, PredSet-1Step may be used in the common scenario if the user suspects -- but may not be certain -- that covariate shift is present, and does not know the form of the shift. Our theory hinges on a new bound for the convergence rate of Wald confidence interval coverage for general asymptotically linear estimators. This is a technical tool of independent interest.
Yifan Cui and Eric Tchetgen Tchetgen (Working), Selective machine learning of doubly robust functionals.
Yifan Cui and Eric Tchetgen Tchetgen (2021), A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity, Journal of the American Statistical Association, 116 (133), pp. 162-173.
Wey Wen Lim, Nancy H L Leung, Sheena G. Sullivan, Eric Tchetgen Tchetgen, Benjamin J. Cowling (2020), Distinguishing Causation from Correlation in the Use of Correlates of Protection to Evaluate and Develop Influenza Vaccines, American Journal of Epidemiology, (to appear) ().
Tom Chen, Eric Tchetgen Tchetgen, Rui Wang (2020), A Stochastic Second-Order Generalized Estimating Equations Approach for Estimating Association Parameters, Journal of Computational and Graphical Statistics , (to appear) ().
Haben Michael, Yifan Cui, Scott A. Lorch, Eric Tchetgen Tchetgen (Working), Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.
This course will provide an in depth investigation of statistical methods for drawing causal inferences from complex observational studies and imperfect randomized experiments. Formalization will be given for key concepts at the foundation of causal inference, including: confounding, comparability, positivity, interference, intermediate variables, total effects, controlled direct effects, natural direct and indirect effects for mediation analysis, generalizability, transportability, selection bias, etc.... These concepts will be formally defined within the context of a counterfactual causal model. Methods for estimating total causal effects in the context of both point and time-varying exposure will be discussed, including regression-based methods, propensity score techniques and instrumental variable techniques for continuous, discrete, binary and time to event outcomes. Mediation analysis will be discussed from a counterfactual perspective. Causal directed acyclic graphs (DAGs) and associated nonparametric structural equations models (NPSEMs) will be used to formalize identification of causal effects for static and dynamic longitudinal treatment regimes under unconfoundedness and unmeasured confounding settings. This formalization will be used to define, identify and make inferences about the joint effects of time-varying exposures in the presence of (possibly hidden) time-dependent covariates that are simultaneously confounders and intermediate variables. These methods include g-estimation of structural nested models, inverse probability weighted estimators of marginal structural models, and g-computation algorithm estimators. Credible quasi-experimental causal inference methods will be described, leveraging auxiliary variables such as instrumental variables, negative control variables, or more broadly confounding proxy variables. Quasi-experimental methods discussed will include the control outcome calibration approach, proximal causal inference, difference-in-differences and related generalizations of these methods. Semiparametric efficiency and the prospects for doubly robust inference will feature prominently throughout the course, including methods that combine modern semiparametric theory and machine learning techniques.
STAT9220001 ( Syllabus )
Student lab rotation.
Ph.D. students enroll in this course after passing their candidacy exam. They work on their dissertation full-time under the guidance of their dissertation supervisor and other members of their dissertation committee.
An applied graduate level course for students who have completed an undergraduate course in basic statistical methods. Covers two unrelated topics: loglinear and logit models for discrete data and nonparametric methods for nonnormal data. Emphasis is on practical methods of data analysis and their interpretation. Primarily for doctoral students in the managerial, behavioral, social and health sciences. Permission of instructor required to enroll.
Written permission of instructor and the department course coordinator required to enroll in this course.
An applied graduate level course for students who have completed an undergraduate course in basic statistical methods. Covers two unrelated topics: loglinear and logit models for discrete data and nonparametric methods for nonnormal data. Emphasis is on practical methods of data analysis and their interpretation. Primarily for doctoral students in the managerial, behavioral, social and health sciences. Permission of instructor required to enroll.
This course will cover statistical methods for the design and analysis of observational studies. Topics will include the potential outcomes framework for causal inference; randomized experiments; matching and propensity score methods for controlling confounding in observational studies; tests of hidden bias; sensitivity analysis; and instrumental variables.
This course will provide an in depth investigation of statistical methods for drawing causal inferences from complex observational studies and imperfect randomized experiments. Formalization will be given for key concepts at the foundation of causal inference, including: confounding, comparability, positivity, interference, intermediate variables, total effects, controlled direct effects, natural direct and indirect effects for mediation analysis, generalizability, transportability, selection bias, etc.... These concepts will be formally defined within the context of a counterfactual causal model. Methods for estimating total causal effects in the context of both point and time-varying exposure will be discussed, including regression-based methods, propensity score techniques and instrumental variable techniques for continuous, discrete, binary and time to event outcomes. Mediation analysis will be discussed from a counterfactual perspective. Causal directed acyclic graphs (DAGs) and associated nonparametric structural equations models (NPSEMs) will be used to formalize identification of causal effects for static and dynamic longitudinal treatment regimes under unconfoundedness and unmeasured confounding settings. This formalization will be used to define, identify and make inferences about the joint effects of time-varying exposures in the presence of (possibly hidden) time-dependent covariates that are simultaneously confounders and intermediate variables. These methods include g-estimation of structural nested models, inverse probability weighted estimators of marginal structural models, and g-computation algorithm estimators. Credible quasi-experimental causal inference methods will be described, leveraging auxiliary variables such as instrumental variables, negative control variables, or more broadly confounding proxy variables. Quasi-experimental methods discussed will include the control outcome calibration approach, proximal causal inference, difference-in-differences and related generalizations of these methods. Semiparametric efficiency and the prospects for doubly robust inference will feature prominently throughout the course, including methods that combine modern semiparametric theory and machine learning techniques.
This course is designed for Ph.D. students in statistics and will cover various advanced methods and models that are useful in applied statistics. Topics for the course will include missing data, measurement error, nonlinear and generalized linear regression models, survival analysis, experimental design, longitudinal studies, building R packages and reproducible research.
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
Dissertation
For the paper, “Assessment and indirect adjustment for confounding by smoking in cohort studies using relative hazards model” with David Richardson, Steve Cole
and Dominique Laurier.
For the paper, “The use of negative controls to detect confounding and other sources of error in experimental and observational science.” with Marc Lipsitch and Ted Cohen.
Scientists tested a costly approach to curbing the AIDS epidemic: Test everyone in the community, and treat anyone who is infected.
Scientists tested a costly approach to curbing the AIDS epidemic: Test everyone in the community, and treat anyone who is infected.
New York Times - 07/17/2019