417 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: applications of statistics to public health, design and analysis of experiments and observational studies for comparing treatments, longitudinal data, measurement error, medicine and economics
PhD, Stanford University, 2002
BA, Harvard University, 1997
Wharton: 2002-present
Yu Gui, Dylan Small, Zhimei Ren, Adaptive discovery of effect modification in matched observational studies.
Dane Isenberg, Edward Kennedy, Richard Landis, Nandita Mitra, James M. Robins, Jason Roy, Alisa J. Stephens-Shields, Wei Yang, Dylan Small (2024), Marshall Joffe’s Contributions to Causal Inference, Biostatistics, and Epidemiology, American Journal of Epidemiology , 193 (4), pp. 563-576.
Kwonsang Lee, Bhaswar B. Bhattacharya, Jing Qin, Dylan Small (2023), A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models, Journal of the Korean Statistical Society , 52 (p.p. 1055-1077).
Jeffrey Zhang and Dylan Small (2023), Sensitivity Analysis for Observational Studies with Recurrent Events, Lifetime Data Analysis, 30 (p.p. 237-261).
Xinran Li and Dylan Small (2023), Randomization-Based Test for Censored Outcomes: A new Look at the Logrank Test, Statistical Science, 38 (1), pp. 92-107.
Jeffrey H. Silber, Paul R. Rosenbaum, Joseph G. Reiter, Alexander Hill, Siddarth Jain, David Wolk, Dylan Small, Sean Hashemi, Lee A. Fleisher, Bijan A. Niknam, Mark D. Neuman, Roderic Eckenhoff (2022), Alzheimer’s dementia after exposure to anesthesia and surgery in the elderly: a matched natural experiment using appendicitis, Annals of Surgery, 11. 10.1097/SLA.0000000000004632
Abstract: Objective: The aim of this study was to determine whether surgery and anesthesia in the elderly may promote Alzheimer disease and related dementias (ADRD). Background: There is a substantial conflicting literature concerning the hypothesis that surgery and anesthesia promotes ADRD. Much of the literature is confounded by indications for surgery or has small sample size. This study examines elderly patients with appendicitis, a common condition that strikes mostly at random after controlling for some known associations. Methods: A matched natural experiment of patients undergoing appendectomy for appendicitis versus control patients without appendicitis using Medicare data from 2002 to 2017, examining 54,996 patients without previous diagnoses of ADRD, cognitive impairment, or neurological degeneration, who developed appendicitis between ages 68 through 77 years and underwent an appendectomy (the ‘‘Appendectomy’’ treated group), matching them 5:1 to 274,980 controls, examining the subsequent hazard for developing ADRD. Results: The hazard ratio (HR) for developing ADRD or death was lower in the Appendectomy group than controls: HR = 0.96 [95% confidence interval (CI) 0.94–0.98], P < 0.0001, (28.2% in Appendectomy vs 29.1% in controls, at 7.5 years). The HR for death was 0.97 (95% CI 0.95–0.99), P = 0.002, (22.7% vs 23.1% at 7.5 years). The HR for developing ADRD alone was 0.89 (95% CI 0.86–0.92), P < 0.0001, (7.6% in Appendectomy vs 8.6% in controls, at 7.5 years). No subgroup analyses found significantly elevated rates of ADRD in the Appendectomy group. Conclusion: In this natural experiment involving 329,976 elderly patients, exposure to appendectomy surgery and anesthesia did not increase the subsequent rate of ADRD.
Katherine Brumberg, Dylan Small, Paul R. Rosenbaum (2022), Using randomized rounding of linear programs to obtain unweighted natural strata that balance many covariates, Journal of the Royal Statistical Society Series A: Statistics in Society, 21. 10.1111/rssa.12848
Abstract: In causal inference, natural strata are a new compromise between conventional strata and matching in a fixed ratio, say pair matching or matching two controls to each treated individual. Like matching in a fixed ratio, natural strata: (a) do not require weights, (b) balance many measured covariates beyond those that define the strata and (c) provide closer balance for a measured continuous covariate coarsely cut to form strata. Unlike matching in a fixed ratio, the ratio of controls to treated individuals need not be an integer, so if the data permit a fixed ratio comparison of 1-to-2.5 or even 1-to-0.75, then these ratios are possible using natural strata. Optimal natural strata are defined by a moderate number of fixed strata plus an integer program that minimizes the imbalance in many other measured covariates that are not used to specify the strata. Solving large integer programs is computationally difficult. A tool in the theory of approximation algorithms is ‘randomized rounding of a linear program’ to produce an integer solution: a fractional solution to a linear program defines a probability distribution for an integer-valued random variable which is sampled. We apply this tool in a new way to produce natural strata and develop new properties of randomize rounding in this context. When proportional strata are impractical, we approximate them by minimizing the earthmover distance to proportionality. The method is applied to study birth outcomes for older and younger mothers in the United States in 2018. An R package natstrat is available at CRAN.
Ruoqi Yu, Dylan Small, Paul R. Rosenbaum (2021), The Information in Covariate Imbalance in Studies of Hormone Replacement Therapy, Annals of Applied Statistics, 15 (4), pp. 2023-2042.
Jing Cheng and Dylan Small (2021), Semiparametric Models and Inference for the Effect of a Treatment When the Outcome is Nonnegative with Clumping at Zero, Biometrics, 77 (4), pp. 1187-1201.
Siyu Heng and Dylan Small (2021), Sharpening the Rosenbaum Sensitivity Bounds to Address Concerns about Interactions Between Observed and Unobserved Covariates, Statistica Sinica, 31 ().
Independent Study allows students to pursue academic interests not available in regularly offered courses. Students must consult with their academic advisor to formulate a project directly related to the student’s research interests. All independent study courses are subject to the approval of the AMCS Graduate Group Chair.
Allows for a PhD student to be enrolled full-time to work exclusively on research, writing and preparing his/her doctoral thesis and defense. All required coursework (20 CUs) must be completed, and the student must have passed his/her thesis proposal/oral candidacy examination prior to being enrolled.
For students writing a Master's Thesis to fulfill the program's requirements. All required coursework (8 CUs) must be completed prior to being enrolled.
Study under the direction of a faculty member.
Student lab rotation.
Study under the direction of a faculty member. Intended for a limited number ofmathematics majors.
Written permission of instructor and the department course coordinator required to enroll in this course.
This course provides students with the opportunity to hone their data science skills and gain practical experience by working with a community organization on a data science problem of interest to the organization. Students will gain skills in problem formulation, collaboration with community organizations and communication of data science results. Students will work in groups of 3-5 on a data science problem of interest to a community organization. This is an Academically Based Community Service (ABCS) course. Prerequisites: The course presumes that students have taken a sequence of introductory statistics courses such as STAT 1010/1020, or 4300/4310 and that they have taken a course that has exposed them to more advanced techniques such as STAT 4220, 4230, 4420, 4710 or 4730. It will be assumed that students have knowledge of a statistical programming language such as R or Python. Classes such as STAT 4050, 4700, 4770 or 4800 would meet this requirement.
Questions about cause are at the heart of many everyday decisions and public policies. Does eating an egg every day cause people to live longer or shorter or have no effect? Do gun control laws cause more or less murders or have no effect? Causal inference is the subfield of statistics that considers how we should make inferences about such questions. This course will cover the key concepts and methods of causal inference rigorously. Background in probability and statistics; some knowledge of R is recommended.
This course provides students with the opportunity to hone their data science skills and gain practical experience by working with a community organization on a data science problem of interest to the organization. Students will gain skills in problem formulation, collaboration with community organizations and communication of data science results. Students will work in groups of 3-5 on a data science problem of interest to a community organization. This is an Academically Based Community Service (ABCS) course. Prerequisite: The course presumes that students have taken a sequence of introductory statistics courses such as STAT 1010/1020, or 4300/4310 and that they have taken a course that has exposed them to more advanced techniques such as STAT 4220, 4230, 4420, 4710 or 4730. It will be assumed that students have knowledge of a statistical programming language such as R or Python. Classes such as STAT 4050, 4700, 4770 or 4800 would meet this requirement.
This course will cover statistical methods for the design and analysis of observational studies. Topics will include the potential outcomes framework for causal inference; randomized experiments; matching and propensity score methods for controlling confounding in observational studies; tests of hidden bias; sensitivity analysis; and instrumental variables.
This course is designed for Ph.D. students in statistics and will cover various advanced methods and models that are useful in applied statistics. Topics for the course will include missing data, measurement error, nonlinear and generalized linear regression models, survival analysis, experimental design, longitudinal studies, building R packages and reproducible research.
Theory of the Gaussian Linear Model, with applications to illustrate and complement the theory. Distribution theory of standard tests and estimates in multiple regression and ANOVA models. Model selection and its consequences. Random effects, Bayes, empirical Bayes and minimax estimation for such models. Generalized (Log-linear) models for specific non-Gaussian settings.
This seminar is for graduate students who wish to learn about current research frontiers. It covers advanced topics in probability, statistical theory and methods, applied statistics, data science and artificial intelligence. Specific topics vary from year to year and emphasize both theoretical foundations and applications.
This seminar-based course provides students with the opportunity to hone their data science skills and gain practical experience by working with a community organization on a data science problem of interest to the organization. Students will gain skills in problem formulation, collaboration with community organizations and communication of data science results. Students will work in groups on a data science problem of interest to a community organization.
Dissertation
Written permission of instructor and the department course coordinator required to enroll.
New Wharton research examines the long-term impact of playing high school or college football.…Read More
Knowledge at Wharton - 7/21/2017How Wharton’s research programs prepare undergraduates for careers in academia and the private sector.
Wharton Magazine - 01/01/2011
Uniting Great Minds, Wharton’s Stat Bridge MA Program Takes FlightA new program in Wharton’s Department of Statistics and Data Science offers advanced coursework and research experience for students who hope to earn a PhD but need additional preparation for admission to a statistics doctoral program. The Bridge to a Doctorate Program in Statistics and Data Science is a two-year…
Wharton Stories - 09/13/2023