STAT9150 - Nonparametric Inference (Course Syllabus)
Statistical inference when the functional form of the distribution is not specified. Nonparametric function estimation, density estimation, survival analysis, contingency tables, association, and efficiency.
Prerequisites: STAT 5200
STAT9200 - Sample Survey Methods (Course Syllabus)
This course will cover the design and analysis of sample surveys. Topics include simple random sampling, stratified sampling, cluster sampling, graphics, regression analysis using complex surveys and methods for handling nonresponse bias.
Prerequisites: STAT 5200 OR STAT 9610 OR STAT 9700
STAT9210 - Observational Studies (Course Syllabus)
This course will cover statistical methods for the design and analysis of observational studies. Topics will include the potential outcomes framework for causal inference; randomized experiments; matching and propensity score methods for controlling confounding in observational studies; tests of hidden bias; sensitivity analysis; and instrumental variables.
Prerequisites: STAT 5200 OR STAT 9610 OR STAT 9700
STAT9220 - Advanced Causal Inference (Course Syllabus)
This course will provide an in depth investigation of statistical methods for drawing causal inferences from complex observational studies and imperfect randomized experiments. Formalization will be given for key concepts at the foundation of causal inference, including: confounding, comparability, positivity, interference, intermediate variables, total effects, controlled direct effects, natural direct and indirect effects for mediation analysis, generalizability, transportability, selection bias, etc.... These concepts will be formally defined within the context of a counterfactual causal model. Methods for estimating total causal effects in the context of both point and time-varying exposure will be discussed, including regression-based methods, propensity score techniques and instrumental variable techniques for continuous, discrete, binary and time to event outcomes. Mediation analysis will be discussed from a counterfactual perspective. Causal directed acyclic graphs (DAGs) and associated nonparametric structural equations models (NPSEMs) will be used to formalize identification of causal effects for static and dynamic longitudinal treatment regimes under unconfoundedness and unmeasured confounding settings. This formalization will be used to define, identify and make inferences about the joint effects of time-varying exposures in the presence of (possibly hidden) time-dependent covariates that are simultaneously confounders and intermediate variables. These methods include g-estimation of structural nested models, inverse probability weighted estimators of marginal structural models, and g-computation algorithm estimators. Credible quasi-experimental causal inference methods will be described, leveraging auxiliary variables such as instrumental variables, negative control variables, or more broadly confounding proxy variables. Quasi-experimental methods discussed will include the control outcome calibration approach, proximal causal inference, difference-in-differences and related generalizations of these methods. Semiparametric efficiency and the prospects for doubly robust inference will feature prominently throughout the course, including methods that combine modern semiparametric theory and machine learning techniques.
Prerequisites: STAT 9210 OR BSTA 7900
STAT9250 - Multivariate Analy: Theo (Course Syllabus)
This is a course that prepares PhD students in statistics for research in multivariate statistics and high dimensional statistical inference. Topics from classical multivariate statistics include the multivariate normal distribution and the Wishart distribution; estimation and hypothesis testing of mean vectors and covariance matrices; principal component analysis, canonical correlation analysis and discriminant analysis; etc. Topics from modern multivariate statistics include the Marcenko-Pastur law, the Tracy-Widom law, nonparametric estimation and hypothesis testing of high-dimensional covariance matrices, high-dimensional principal component analysis, etc.
Prerequisites: STAT 9300 OR STAT 9700 OR STAT 9720
STAT9260 - Multivariate Analy: Meth (Course Syllabus)
This is a course that prepares PhD students in statistics for research in multivariate statistics and data visualization. The emphasis will be on a deep conceptual understanding of multivariate methods to the point where students will propose variations and extensions to existing methods or whole new approaches to problems previously solved by classical methods. Topics include: principal component analysis, canonical correlation analysis, generalized canonical analysis; nonlinear extensions of multivariate methods based on optimal transformations of quantitative variables and optimal scaling of categorical variables; shrinkage- and sparsity-based extensions to classical methods; clustering methods of the k-means and hierarchical varieties; multidimensional scaling, graph drawing, and manifold estimation.
Prerequisites: STAT 9610
STAT9270 - Bayesian Statistics (Course Syllabus)
This graduate course will cover the modeling and computation required to perform advanced data analysis from the Bayesian perspective. We will cover fundamental topics in Bayesian probability modeling and implementation, including recent advances in both optimization and simulation-based estimation strategies. Key topics covered in the course include hierarchical and mixture models, Markov Chain Monte Carlo, hidden Markov and dynamic linear models, tree models, Gaussian processes and nonparametric Bayesian strategies.
Prerequisites: STAT 4300 OR STAT 5100
STAT9280 - Stat Learning Theory (Course Syllabus)
Statistical learning theory studies the statistical aspects of machine learning and automated reasoning, through the use of (sampled) data. In particular, the focus is on characterizing the generalization ability of learning algorithms in terms of how well they perform on "new" data when trained on some given data set. The focus of the course is on: providing the fundamental tools used in this analysis; understanding the performance of widely used learning algorithms; understanding the "art" of designing good algorithms, both in terms of statistical and computational properties. Potential topics include: empirical process theory; online learning; stochastic optimization; margin based algorithms; feature selection; concentration of measure. Background in probability and linear algebra recommended.
STAT9300 - Probability Theory (Course Syllabus)
Measure theoretic foundations, laws of large numbers, large deviations, distributional limit theorems, Poisson processes, random walks, stopping times.
Prerequisites: STAT 4300 OR STAT 5100 OR MATH 6080
STAT9310 - Stochastic Processes (Course Syllabus)
Continuation of MATH 6480/STAT 9300, the 2nd part of Probability Theory for PhD students in the math or statistics department. The main topics include Brownian motion, martingales, Ito's formula, and their applications to random walk and PDE.
Prerequisites: MATH 5460 OR STAT 9300
STAT9550 - Stoch Cal & Fin Appl (Course Syllabus)
Selected topics in the theory of probability and stochastic processes.
Prerequisites: STAT 9300
STAT9600 - Stat Algorithms & Comp (Course Syllabus)
This course aims to prepare students for graduate work in the design, analysis, and implementation of statistical algorithms. The target audience is Ph.D. students in statistics or in adjacent fields, such as computer science, mathematics, electrical engineering, computational biology, economics, and marketing. We will take a fundamental approach and focus on classes of algorithms of primary importance in statistics and statistical machine learning. Some meta-classes of algorithms that may receive significant attention are optimization, sampling, and numerical linear algebra. I aim to make the content complementary rather than overlapping with other courses at Penn, such as ESE6050, CIS6770, and the CIS7000 series. While there may be some overlap in the portions of the course that cover optimization, the sampling (Monte Carlo and related) aspects of the course are, to my knowledge, hard to find elsewhere at Penn. The course is fast paced and I expect a certain degree of mathematical preparation. Most students in the above mentioned programs will have the requisite mathematics background. I also expect familiarity with an appropriate programming language such as R, python, or matlab. The course will be mostly language agnostic. However, I may at times give example code in one of these languages, and you will be expected to be able to read the code even if it is not in your "primary" language. We may make use of various open-source toolboxes and packages for these environments, such as the Stan probabilistic programming language (best used with R) and the cvx toolbox for convex programming (available for multiple platforms but perhaps best used with matlab).
STAT9610 - Statistical Methodology (Course Syllabus)
This is a course that prepares 1st year PhD students in statistics for a research career. This is not an applied statistics course. Topics covered include: linear models and their high-dimensional geometry, statistical inference illustrated with linear models, diagnostics for linear models, bootstrap and permutation inference, principal component analysis, smoothing and cross-validation.
Prerequisites: STAT 4310 OR STAT 5200
STAT9620 - Adv Methods Applied Stat (Course Syllabus)
This course is designed for Ph.D. students in statistics and will cover various advanced methods and models that are useful in applied statistics. Topics for the course will include missing data, measurement error, nonlinear and generalized linear regression models, survival analysis, experimental design, longitudinal studies, building R packages and reproducible research.
Prerequisites: STAT 9610
STAT9700 - Mathematical Statistics (Course Syllabus)
Decision theory and statistical optimality criteria, sufficiency, point estimation and hypothesis testing methods and theory.
Prerequisites: STAT 4310 OR STAT 5200
STAT9710 - Intro To Linear Stat Mod (Course Syllabus)
Theory of the Gaussian Linear Model, with applications to illustrate and complement the theory. Distribution theory of standard tests and estimates in multiple regression and ANOVA models. Model selection and its consequences. Random effects, Bayes, empirical Bayes and minimax estimation for such models. Generalized (Log-linear) models for specific non-Gaussian settings.
Prerequisites: STAT 9700
STAT9720 - Adv Topics in Math Stat (Course Syllabus)
A continuation of STAT 9700.
Prerequisites: STAT 9700 AND STAT 9710
STAT9740 - Modern Regression (Course Syllabus)
Function estimation and data exploration using extensions of regression analysis: smoothers, semiparametric and nonparametric regression, and supervised machine learning. Conceptual foundations are addressed as well as hands-on use for data analysis.
Prerequisites: STAT 1020 OR STAT 1120
STAT9800 - Intro to Biomed Data Science (Course Syllabus)
This course offers a comprehensive introduction to biomedical data science research, tailored for graduate students from Statistics and various interdisciplinary domains. Aimed at facilitating end-to-end data science research capabilities, this course covers the development and application of computational methods and statistical techniques for analyzing voluminous datasets, particularly in biology, healthcare, and medicine. Students will gain insights into various data types prevalent in biomedical research, emerging large-scale data resources, and the art of formulating scientific questions. The course encompasses methodology research, scientific research, collaborative research, computing tools, software development, as well as scientific writing, including both research papers and grant proposals. By the end of the course, students will be equipped with the foundational skills and knowledge required to excel as statisticians and research scientists, whether they choose to pursue a career in industry or academia. Prerequisite: For students from the STAT department, this course is tailored for those who have successfully completed the qualifying exam and are ready to embark on their research journey. Exceptions for first-year students will be considered on an individual basis. For master's or Ph.D. students from other departments or programs, such as AMCS, the prerequisites will differ based on their specific curriculum. At a minimum, students should have master-level expertise in one or more of the following areas: applied mathematics and probability, computing and software development, web development, bioinformatics, biostatistics, epidemiology, computational biology, genetics/genomics, neuroscience, radiology, and medical imaging.
STAT9910 - Sem in Adv Appl of Stat (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9911 - Sem in Adv Appl of Stat (ML) (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9912 - Sem in Adv Appl of Stat (OPT) (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9913 - Sem in Adv Appl of Stat (Prob) (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9914 - Sem in Adv Appl of Stat (MAST) (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9915 - Sem in Adv Appl of Stat (Course Syllabus)
This semester-long course explores the forefront of biomedical data science, focusing on the computational challenges in analyzing single-cell and spatial genomic data. Structured into six in-depth modules, the course offers a blend of lectures and journal club discussions to cover a range of topics in the field. Students will engage with current research topics in the area, develop critical data modeling skills, and learn to critique and enhance existing methodologies. Designed for both computational and biomedical backgrounds, the course provides a springboard into research topics in single cell and spatial genomics, equipping students with the tools to frame scientific problems computationally and to rigorously evaluate computational methods.
STAT9916 - Sem in Adv Appl of Stat (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9917 - Sem in Adv Appl of Stat (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9918 - Sem in Adv Appl of Stat (Course Syllabus)
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9950 - Dissertation (Course Syllabus)
Dissertation
STAT9999 - Independent Study (Course Syllabus)
Written permission of instructor and the department course coordinator required to enroll.
Department of Statistics and Data Science
The Wharton School,
University of Pennsylvania
Academic Research Building
265 South 37th Street, 3rd & 4th Floors
Philadelphia, PA 19104-1686
Phone: (215) 898-8222
PhD Program
Students
- William Bekerman, PhD Student
- Jinho Bok, PhD Student
- Abhinav Chakraborty, PhD Student
- Anirban Chatterjee, PhD Student
- Sayak Chatterjee, PhD Student
- Abhinandan Dalal, PhD Student
- Mauricio Daros Andrade, PhD Student
- Joseph Deutsch, PhD Student
- Zhehang Du, PhD Student
- Wei Fan, PhD Student
- Zirui Fan, PhD Student
- Ryan Gross, PhD Student
- Yihui He, PhD Student
- Iris Horng, PhD Student
- Yu Huang, PhD Student
- Zhihan Huang, PhD Student
- Kevin Jiang, PhD Student
- Dongwoo Kim, PhD Student
- Junu Lee, PhD Student
- Chris Lin, PhD Student
- Yuxuan Lin, PhD Student
- Jiuyao Lu, PhD Student
- Wanteng Ma, PhD Student
- Soham Mallick, PhD Student
- Kaishu Mason, PhD Student
- Ziang Niu, PhD Student
- Manit Paul, PhD Student
- Jonathan Pipping, PhD Student
- Joseph Rudoler, PhD Student
- Henry Shugart, PhD Student
- Kevin Tan, PhD Student
- Hwai-Liang Tung, PhD Student
- Xiaomeng Wang, PhD Student
- Tao Wang, PhD Student
- Yangxinyu Xie, PhD Student
- Ziqing Xu, PhD Student
- Eddie Yang, PhD Student
- Jeffrey Zhang, PhD Student
- Zhaojun Zhang, PhD Student
- Fred Zhang, PhD Student
- Lei Zhao, PhD Student
- Zihan Zhu, PhD Student
- Zijie Zhuang, PhD Student