Eric J. Tchetgen Tchetgen

University Professor
Professor of Biostatistics in Biostatistics and Epidemiology
Professor of Statistics and Data Science

Contact Information

Primary Email:
ett@wharton.upenn.edu
Office Phone:
215-746-4328

office Address:
407 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104

Research Interests: Semiparametric theory, nonparametric statistics, causal inference, missing data, and epidemiologic methods.

Links: CV

Overview

Education

Ph.D., 2006, Harvard University
B.S., 1999, Yale University

Research

My primary area of interest is in semi-parametric efficiency theory with application to causal inference, missing data problems, statistical genetics and mixed model theory. In general, I work on the development of statistical and epidemiologic methods that make efficient use of the information in data collected by scientific investigators, while avoiding unnecessary assumptions about the underlying data generating mechanism.

Research

Yonghoon Lee, Eric Tchetgen Tchetgen, Edgar Dobriban (Working), Batch Predictive Inference.
Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen (Working), Finding Distributions that Differ, with False Discovery Rate Control.
Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score ε-Discretization.
Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift.
Hongxiang Qiu, Xu Shi, Wang Miao, Edgar Dobriban, Eric Tchetgen Tchetgen, Doubly Robust Proximal Synthetic Controls.
Description: https://arxiv.org/abs/2210.02014
Hongxiang Qiu, Edgar Dobriban, Eric Tchetgen Tchetgen (Draft), Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift.
Abstract: Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious challenge and has yet to be solved. In the framework of semiparametric statistics, we can view the covariate shift as a nuisance parameter. In this paper, we propose a novel flexible distribution-free method, PredSet-1Step, to construct prediction sets that can efficiently adapt to unknown covariate shift. PredSet-1Step relies on a one-step correction of the plug-in estimator of coverage error. We theoretically show that our methods are asymptotically probably approximately correct (PAC), having low coverage error with high confidence for large samples. PredSet-1Step may also be used to construct asymptotically risk-controlling prediction sets. We illustrate that our method has good coverage in a number of experiments and by analyzing a data set concerning HIV risk prediction in a South African cohort study. In experiments without covariate shift, PredSet-1Step performs similarly to inductive conformal prediction, which has finite-sample PAC properties. Thus, PredSet-1Step may be used in the common scenario if the user suspects -- but may not be certain -- that covariate shift is present, and does not know the form of the shift. Our theory hinges on a new bound for the convergence rate of Wald confidence interval coverage for general asymptotically linear estimators. This is a technical tool of independent interest.
Yifan Cui and Eric Tchetgen Tchetgen (Working), Selective machine learning of doubly robust functionals.
Yifan Cui and Eric Tchetgen Tchetgen (2021), A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity, Journal of the American Statistical Association, 116 (133), pp. 162-173.
Wey Wen Lim, Nancy H L Leung, Sheena G. Sullivan, Eric Tchetgen Tchetgen, Benjamin J. Cowling (2020), Distinguishing Causation from Correlation in the Use of Correlates of Protection to Evaluate and Develop Influenza Vaccines, American Journal of Epidemiology, (to appear) ().
Tom Chen, Eric Tchetgen Tchetgen, Rui Wang (2020), A Stochastic Second-Order Generalized Estimating Equations Approach for Estimating Association Parameters, Journal of Computational and Graphical Statistics , (to appear) ().

Teaching

All Courses

BSTA6990 - Lab Rotation
Student lab rotation.
BSTA8990 - Pre-Dissertation Lab Rot
BSTA9950 - Dissertation
Ph.D. students enroll in this course after passing their candidacy exam. They work on their dissertation full-time under the guidance of their dissertation supervisor and other members of their dissertation committee.
PSYC6120 - Int To Nonp & Loglin Mod
An applied graduate level course for students who have completed an undergraduate course in basic statistical methods. Covers two unrelated topics: loglinear and logit models for discrete data and nonparametric methods for nonnormal data. Emphasis is on practical methods of data analysis and their interpretation. Primarily for doctoral students in the managerial, behavioral, social and health sciences. Permission of instructor required to enroll.
STAT3990 - Independent Study
Written permission of instructor and the department course coordinator required to enroll in this course.
STAT5010 - Int To Nonp & Loglin Mod
An applied graduate level course for students who have completed an undergraduate course in basic statistical methods. Covers two unrelated topics: loglinear and logit models for discrete data and nonparametric methods for nonnormal data. Emphasis is on practical methods of data analysis and their interpretation. Primarily for doctoral students in the managerial, behavioral, social and health sciences. Permission of instructor required to enroll.
STAT9210 - Observational Studies
This course will cover statistical methods for the design and analysis of observational studies. Topics will include the potential outcomes framework for causal inference; randomized experiments; matching and propensity score methods for controlling confounding in observational studies; tests of hidden bias; sensitivity analysis; and instrumental variables.
STAT9220 - Advanced Causal Inference
This course will provide an in depth investigation of statistical methods for drawing causal inferences from complex observational studies and imperfect randomized experiments. Formalization will be given for key concepts at the foundation of causal inference, including: confounding, comparability, positivity, interference, intermediate variables, total effects, controlled direct effects, natural direct and indirect effects for mediation analysis, generalizability, transportability, selection bias, etc.... These concepts will be formally defined within the context of a counterfactual causal model. Methods for estimating total causal effects in the context of both point and time-varying exposure will be discussed, including regression-based methods, propensity score techniques and instrumental variable techniques for continuous, discrete, binary and time to event outcomes. Mediation analysis will be discussed from a counterfactual perspective. Causal directed acyclic graphs (DAGs) and associated nonparametric structural equations models (NPSEMs) will be used to formalize identification of causal effects for static and dynamic longitudinal treatment regimes under unconfoundedness and unmeasured confounding settings. This formalization will be used to define, identify and make inferences about the joint effects of time-varying exposures in the presence of (possibly hidden) time-dependent covariates that are simultaneously confounders and intermediate variables. These methods include g-estimation of structural nested models, inverse probability weighted estimators of marginal structural models, and g-computation algorithm estimators. Credible quasi-experimental causal inference methods will be described, leveraging auxiliary variables such as instrumental variables, negative control variables, or more broadly confounding proxy variables. Quasi-experimental methods discussed will include the control outcome calibration approach, proximal causal inference, difference-in-differences and related generalizations of these methods. Semiparametric efficiency and the prospects for doubly robust inference will feature prominently throughout the course, including methods that combine modern semiparametric theory and machine learning techniques.
STAT9620 - Adv Methods Applied Stat
This course is designed for Ph.D. students in statistics and will cover various advanced methods and models that are useful in applied statistics. Topics for the course will include missing data, measurement error, nonlinear and generalized linear regression models, survival analysis, experimental design, longitudinal studies, building R packages and reproducible research.
STAT9910 - Sem in Adv Appl of Stat
This seminar is for graduate students who wish to learn about current research frontiers. It covers advanced topics in probability, statistical theory and methods, applied statistics, data science and artificial intelligence. Specific topics vary from year to year and emphasize both theoretical foundations and applications.
STAT9950 - Dissertation
Dissertation
STAT9999 - Independent Study
Written permission of instructor and the department course coordinator required to enroll.

Awards and Honors

Charles L. Odoroff Memorial Lecture, University of Rochester, 2026
The Kirk Public Lecture, Isaac Newton Institute for Mathematical Sciences, University of Cambridge, 2026
The Bernard G. Greenberg Distinguished Lecture Series, University of North Carolina, 2025
The Morris H DeGroot Memorial Lecture, Carnegie Mellon University, 2025
David Cox Medal, 2025
The David Blackwell Lecture, Department of Statistics, Oxford University, 2024
Marshall Joffe Epidemiologic Methods Research Award, Society of Epidemiologic Research, 2024
The Challis Lecture, University of Florida, Gainesville, FL, 2023
The Fuller Lecture, Iowa State University, 2023
Co-winner of the Rousseeuw Prize for Statistics, 2022
Myrto Lefkopoulou Distinguished Lectureship, 2020
Co-winner of the Society of Epidemiologic Research and American Journal of Epidemiology Article of the Year, 2014 Description
For the paper, “Assessment and indirect adjustment for confounding by smoking in cohort studies using relative hazards model” with David Richardson, Steve Cole
and Dominique Laurier.
Career Incubator Award, Harvard School of Public Health, 2013-2014
Co-winner of the Kenneth Rothman Epidemiology Prize, 2011 Description
For the paper, “The use of negative controls to detect confounding and other sources of error in experimental and observational science.” with Marc Lipsitch and Ted Cohen.
Best Poster Award: Gene Environment Initiative Symposium, Boston, MA, 2008
Yerby Fellowship, Harvard School of Public Health, 2006-2008
Mars Scholar, Yale University, 1995-1996

Activity

Latest Research

Yonghoon Lee, Eric Tchetgen Tchetgen, Edgar Dobriban (Working), Batch Predictive Inference.

All Research

In the News

Intensive Anti-H.I.V. Efforts Meet With Mixed Success in Africa

Scientists tested a costly approach to curbing the AIDS epidemic: Test everyone in the community, and treat anyone who is infected.

New York Times - 07/17/2019

All News

Eric J. Tchetgen Tchetgen

Contact Information

Overview

Education

Research

Research

Teaching

All Courses

BSTA6990 - Lab Rotation

BSTA8990 - Pre-Dissertation Lab Rot

BSTA9950 - Dissertation

PSYC6120 - Int To Nonp & Loglin Mod

STAT3990 - Independent Study

STAT5010 - Int To Nonp & Loglin Mod

STAT9210 - Observational Studies

STAT9220 - Advanced Causal Inference

STAT9620 - Adv Methods Applied Stat

STAT9910 - Sem in Adv Appl of Stat

STAT9950 - Dissertation

STAT9999 - Independent Study

Awards and Honors

In the News

Activity

Latest Research

In the News