Research Interests: applications of statistics to public health, design and analysis of experiments and observational studies for comparing treatments, longitudinal data, measurement error, medicine and economics
Links: Personal Website
PhD, Stanford University, 2002
BA, Harvard University, 1997
For more information, go to My Personal Page
Bikram Karmakar, Dylan Small, Paul R. Rosenbaum (2021), Reinforced Designs: Multiple Instruments Plus Control Groups as Evidence Factors in an Observational Study of the Effectiveness of Catholic Schools, Journal of the American Statistical Association, 116 (533), pp. 82-92. 10.1080/01621459.2020.1745811
Abstract: Absent randomization, causal conclusions gain strength if several independent evidence factors concur. We develop a method for constructing evidence factors from several instruments plus a direct comparison of treated and control groups, and we evaluate the methods performance in terms of design sensitivity and simulation. In the application, we consider the effectiveness of Catholic versus public high schools, constructing three evidence factors fromthree past strategies for studying this question, namely: (i) having nearby access to a Catholic school as an instrument, (ii) being Catholic as an instrument for attending Catholic school, and (iii) a direct comparison of students in Catholic and public high schools. Although these three analyses use the same data,we: (i) construct three essentially independent statistical tests of no effect that require very different assumptions, (ii) study the sensitivity of each test to the assumptions underlying that test, (iii) examine the degree to which independent tests dependent upon different assumptions concur, (iv) pool evidence across independent factors. In the application, we conclude that the ostensible benefit of Catholic education depends critically on the validity of one instrument, and is therefore quite fragile.
Qingyuan Zhao, Jingshu Wang, Gibran Hemani, Jack Bowden, Dylan Small (2020), Statistical Inference in Two-sample Summary-data Mendelian Randomization using Robust Adjusted Profile Score, Annals of Statistics, (in press).
Edward H. Kennedy and Dylan Small (2020), Paradoxes in Instrumental Variable Studies with Missing Data and One-sided Noncompliance, Journal of the French Statistical Society, (in press).
Hyunseung Kang, Tony Cai, Dylan Small (Under Review), Robust Confidence Intervals for Causal Effects with Possibly Invalid Instruments.
Bo Zhang, Jordan Weiss, Dylan Small, Qingyuan Zhao (2020), Selecting and Ranking Individualized Treatment Rules With Unmeasured Confounding, Journal of the American Statistical Association, (to appear).
Timothy G. Gaulton, Sameer K. Deshpande, Dylan Small, Mark D. Neuman (2020), Observational Study of the Association between Participation in High School Football and Self-Rated Health, Obesity, and Pain in Adulthood, American Journal of Epidemiology, (to appear).
Bikram Karmakar, Chyke A. Doubeni, Dylan Small (2020), Evidence Factors in a Case-control Study with Application to the Effect of Flexible Sigmoidoscopy Screening on Colorectal Cancer, Annals of Applied Statistics, (to appear).
Bikram Karmakar and Dylan Small (2020), Assessment of the Extent of Corroboration of an Elaborate Theory of a Causal Hypothesis Using Partial Conjunctions of Evidence Factors, Annals of Statistics, (to appear).
Kwonsang Lee, Bhaswar B. Bhattacharya, Jing Qin, Dylan Small (Working), A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models.
Study under the direction of a faculty member.
This course covers Elements of (non-measure theoretic) probability necessary for the further study of statistics and biostatistics. Topics include set theory, axioms of probability, counting arguments, conditional probability, random variables and distributions, expectations, generating functions, families of distributions, joint and marginal distributions, hierarchical models, covariance and correlation, random sampling, sampling properties of statistics, modes of convergence, and random number generation. Two semesters of calculus (through multivariate calculus), linerar algebra, or permission of the instructor to enroll.
This class will cover the fundamental concepts of statistical inference. Topics include sufficiency, consistency, finding and evaluating point estimators, finding and evaluating interval estimators, hypothesis testing, and asymptotic evaluations for point and interval estimation. Prerequisite: If course requirements not met, permission of instructor.
This This class will cover the fundamental concepts of statistical inference. Topics include sufficiency, consistency, finding and evaluating point estimators, finding and evaluating interval estimators, hypothesis testing, and asymptotic evaluations for point and interval estimation.
Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college. This course may be taken concurrently with the prerequisite with instructor permission.
Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications. This course may be taken concurrently with the prerequisite with instructor permission.
Further development of the material in STAT 111, in particular the analysis of variance, multiple regression, non-parametric procedures and the analysis of categorical data. Data analysis via statistical packages. This course may be taken concurrently with the prerequisite with instructor permission.
Written permission of instructor and the department course coordinator required to enroll in this course.
This course will cover the design and analysis of sample surveys. Topics include simple sampling, stratified sampling, cluster sampling, graphics, regression analysis using complex surveys and methods for handling nonresponse bias. This course may be taken concurrently with the prerequisite with instructor permission.
Questions about cause are at the heart of many everyday decisions and public policies. Does eating an egg every day cause people to live longer or shorter or have no effect? Do gun control laws cause more or less murders or have no effect? Causal inference is the subfield of statistics that considers how we should make inferences about such questions. This course will cover the key concepts and methods of causal inference rigorously. The course is intended for statistics concentrators and minors. Knowledge of R such as that covered in STAT 405 or STAT 470 is recommended.
Elements of matrix algebra. Discrete and continuous random variables and their distributions. Moments and moment generating functions. Joint distributions. Functions and transformations of random variables. Law of large numbers and the central limit theorem. Point estimation: sufficiency, maximum likelihood, minimum variance. Confidence intervals. A one-year course in calculus is recommended.
An introduction to the mathematical theory of statistics. Estimation, with a focus on properties of sufficient statistics and maximum likelihood estimators. Hypothesis testing, with a focus on likelihood ratio tests and the consequent development of "t" tests and hypothesis tests in regression and ANOVA. Nonparametric procedures.
This is a course in econometrics for graduate students. The goal is to prepare students for empirical research by studying econometric methodology and its theoretical foundations. Students taking the course should be familiar with elementary statistical methodology and basic linear algebra, and should have some programming experience. Topics include conditional expectation and linear projection, asymptotic statistical theory, ordinary least squares estimation, the bootstrap and jackknife, instrumental variables and two-stage least squares, specification tests, systems of equations, generalized least squares, and introduction to use of linear panel data models.
Topics include system estimation with instrumental variables, fixed effects and random effects estimation, M-estimation, nonlinear regression, quantile regression, maximum likelihood estimation, generalized method of moments estimation, minimum distance estimation, and binary and multinomial response models. Both theory and applications will be stressed.
Questions about cause are at the heart of many everyday decisions and public policies. Does eating an egg every day cause people to live longer or shorter or have no effect? Do gun control laws cause more or less murders or have no effect? Causal inference is the subfield of statistics that considers how we should make inferences about such questions. This course will cover the key concepts and methods of causal inference rigorously. Background in probability and statistics; some knowledge of R is recommended.
This course will cover the design and analysis of sample surveys. Topics include simple random sampling, stratified sampling, cluster sampling, graphics, regression analysis using complex surveys and methods for handling nonresponse bias.
This course will cover statistical methods for the design and analysis of observational studies. Topics will include the potential outcomes framework for causal inference; randomized experiments; matching and propensity score methods for controlling confounding in observational studies; tests of hidden bias; sensitivity analysis; and instrumental variables.
This course is designed for Ph.D. students in statistics and will cover various advanced methods and models that are useful in applied statistics. Topics for the course will include missing data, measurement error, nonlinear and generalized linear regression models, survival analysis, experimental design, longitudinal studies, building R packages and reproducible research.
Decision theory and statistical optimality criteria, sufficiency, point estimation and hypothesis testing methods and theory.
Theory of the Gaussian Linear Model, with applications to illustrate and complement the theory. Distribution theory of standard tests and estimates in multiple regression and ANOVA models. Model selection and its consequences. Random effects, Bayes, empirical Bayes and minimax estimation for such models. Generalized (Log-linear) models for specific non-Gaussian settings.
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
Written permission of instructor and the department course coordinator required to enroll.
New Wharton research examines the long-term impact of playing high school or college football.Knowledge @ Wharton - 7/21/2017
How Wharton’s research programs prepare undergraduates for careers in academia and the private sector.Wharton Magazine - 01/01/2011
How can we use publicly available data to understand what makes a city neighborhood safe? To answer this question, Shane Jensen and Dylan Small, Professors in Wharton’s Statistics Department, are using their skills in big data to reveal patterns of crime and safety in the city. Jonathan Wood, a WSII…Wharton Stories - 04/09/2018