Undergraduate Course Descriptions

STAT101 - INTRO BUSINESS STAT (Course Syllabus)

Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college.

Prerequisites: MATH 104 or MATH 110 or equivalent; successful completion of STAT 101 is prerequisite to STAT 102

STAT102 - INTRO BUSINESS STAT (Course Syllabus)

Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

Prerequisites: STAT 101

STAT111 - INTRODUCTORY STATISTICS (Course Syllabus)

Introduction to concepts in probability. Basic statistical inference procedures of estimation, confidence intervals and hypothesis testing directed towards applications in science and medicine. The use of the JMP statistical package.

Prerequisites: High school algebra.

STAT112 - INTRODUCTORY STATISTICS (Course Syllabus)

Further development of the material in STAT 111, in particular the analysis of variance, multiple regression, non-parametric procedures and the analysis of categorical data. Data analysis via statistical packages.

Prerequisites: STAT 111

STAT399 - INDEPENDENT STUDY

Prerequisites: Written permission of instructor and the department course coordinator.

STAT405 - STAT COMPUTING WITH R (Course Syllabus)

The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

Prerequisites: STAT 102 or STAT 112 or STAT 430

STAT422 - PREDICTIVE ANALYTICS (Course Syllabus)

This course follows from the introductory regression classes, STAT 102, STAT 112, and STAT 431 for undergraduates and STAT 613 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodology known as the random forest. By the end of the course the student will be familiar with and have applied all these tools and will be ready to use them in a work setting. The methodologies can all be implemented in either the JMP or R software packages.

Prerequisites: STAT 102 or STAT 112 or STAT 431

STAT424 - TEXT ANALYTICS

This course introduces methods for the analysis of unstructured data, focusing on statistical models for text. Techniques include those for sentiment analysis, topic models, and predictive analytics. Course includes topics from natural language processing (NLP), such as identifying parts of speech, parsing sentences (e.g., subject and predicate), and named entity recognition (people and places). Unsupervised techniques suited to feature creation provide variables suited to traditional statistical models (regression) and more recent approaches (regression trees). Examples that span the course illustrate the success of text analytics. Hierarchical generating models often associated with nonparametric Bayesian analysis supply theoretical foundations.

Prerequisites: Students should be familiar with regression models at the level of STAT 102 and the R statistics language at the level of STAT 405. Familiarity with the R-Studio development environment is presumed, as well as common R packages such as stringr, dplyr and ggplot. Those with more knowledge of Statistics, such as from STAT 422, or computing skills will benefit. The predominant software used in the course is R, with bits of JMP when helpful for interactive illustration. Familiarity with basic probability models is helpful but not presumed.

STAT430 - PROBABILITY (Course Syllabus)

Discrete and continuous sample spaces and probability; random variables, distributions, independence; expectation and generating functions; Markov chains and recurrence theory.

Prerequisites: MATH 114 or MATH 115 or equivalent

STAT431 - STATISTICAL INFERENCE (Course Syllabus)

Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 101 and 102.

Prerequisites: STAT 430

STAT432 - MATHEMATICAL STATISTICS (Course Syllabus)

An introduction to the mathematical theory of statistics. Estimation, with a focus on properties of sufficient statistics and maximum likelihood estimators. Hypothesis testing, with a focus on likelihood ratio tests and the consequent development of "t" tests and hypothesis tests in regression and ANOVA. Nonparametric procedures.

Prerequisites: STAT 430 or 510 or equivalent

STAT433 - STOCHASTIC PROCESSES (Course Syllabus)

An introduction to Stochastic Processes. The primary focus is on Markov Chains, Martingales and Gaussian Processes. We will discuss many interesting applications from physics to economics. Topics may include: simulations of path functions, game theory and linear programming, stochastic optimization, Brownian Motion and Black-Scholes.

Prerequisites: STAT 430, or permission of instructor

STAT435 - FORECASTING METHODS MGMT (Course Syllabus)

This course provides an introduction to the wide range of techniques available for statistical forecasting. Qualitative techniques, smoothing and decomposition of time series, regression, adaptive methods, autoregressive-moving average modeling, and ARCH and GARCH formulations will be surveyed. The emphasis will be on applications, rather than technical foundations and derivations. The techniques will be studied critically, with examination of their usefulness and limitations.

Prerequisites: STAT 102 or 112 or 431

STAT436 - LARGE-SCALE DATA SCIENCE

The course will focus on computational approaches to large-scale data analysis. The lectures will introduce the relevant concepts, and students will be asked to work on projects, implementing the methods and experimenting with large-scale datasets. The course will cover various techniques for updating models in an online fashion, as well as subsampling and dimensionality-reduction techniques. The students will experiment with neural network architectures and learn to build predictive models for modern machine learning tasks.

Prerequisites: Linear Algebra and basic R programming

STAT442 - INTRO BAYES DATA ANALYS

The course will introduce data analysis from the Bayesian perspective to undergraduate students. We will cover important concepts in Bayesian probability modeling as well as estimation using both optimization and simulation-based strategies. Key topics covered in the course include hierarchical models, mixture models, hidden Markov models and Markov Chain Monte Carlo.

Prerequisites: A course in probability (STAT 430 or equivalent); a course in statistical inference (STAT 102, STAT 112, STAT 431 or equivalent); and experience with the statistical software R (at the level of STAT 405 or STAT 470)

STAT451 - FUND OF ACTUARIAL SCI I (Course Syllabus)

This course is the usual entry point in the actuarial science program. It is required for students who plan to concentrate or minor in actuarial science. It can also be taken by others interested in the mathematics of personal finance and the use of mortality tables. For future actuaries, it provides the necessary knowledge of compound interest and its applications, and basic life contingencies definition to be used throughout their studies. Non-actuaries will be introduced to practical applications of finance mathematics, such as loan amortization and bond pricing, and premium calculation of typical life insurance contracts. Main topics include annuities, loans and bonds; basic principles of life contingencies and determination of annuity and insurance benefits and premiums.

Prerequisites: MATH 104, STAT 430. STAT 430 can be taken concurrently with BEPP 451 or STAT 451

STAT452 - FUND OF ACTUARIAL SCI II

This specialized course is usually only taken by Wharton students who plan to concentrate in actuarial science and Penn students who plan to minor in actuarial mathematics. It provides a comprehensive analysis of advanced life contingencies problems such as reserving, multiple life functions, multiple decrement theory with application to the valuation of pension plans.

Prerequisites: BEPP 451 or STAT 451

STAT453 - ACTUARIAL STATISTICS (Course Syllabus)

This course covers models for insurer's losses, and applications of Markov chains. Poisson processes, including extensions such as non-homogeneous, compound, and mixed Poisson processes are studied in detail. The compound model is then used to establish the distribution of losses. An extensive section on Markov chains provides the theory to forecast future states of the process, as well as numerous applications of Markov chains to insurance, finance, and genetics. The course is abundantly illustrated by examples from the insurance and finance literature. While most of the students taking the course are future actuaries, other students interested in applications of statistics may discover in class many fascinating applications of stochastic processes and Markov chains.

Prerequisites: STAT 430

STAT454 - APPL STAT METHD FOR ACTU

One half of the course is devoted to the study of time series, including ARIMA modeling and forecasting. The other half studies modifications in random variables due to deductibles, co-payments, policy limits, and elements of simulation. This course is a possible entry point into the actuarial science program. The Society of Actuaries has approved STAT 854 for VEE credit on the topic of time series.

Prerequisites: STAT 430, STAT 431

STAT470 - DATA ANALY & STAT COMP (Course Syllabus)

This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

Prerequisites: STAT 101 and 102 or STAT 111 and 112 or STAT 431 or ECON 103 and ECON 104

STAT471 - MODERN DATA MINING (Course Syllabus)

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

Prerequisites: STAT 102 or 112 or 431

STAT474 - MODERN REGRESSION (Course Syllabus)

Function estimation and data exploration using extensions of regression analysis: smoothers, semiparametric and nonparametric regression, and supervised machine learning. Conceptual foundations are addressed as well as hands-on use for data analysis.

Prerequisites: STAT 102 or 112 or equivalent

STAT475 - SAMPLE SURVEY DESIGN

This course will cover the design and analysis of sample surveys. Topics include simple sampling, stratified sampling, cluster sampling, graphics, regression analysis using complex surveys and methods for handling nonresponse bias.

Prerequisites: STAT 102 or 112 or 431

STAT476 - APPL PROB MODELS MKTG (Course Syllabus)

This course will expose students to the theoretical and empirical "building blocks" that will allow them to construct, estimate, and interpret powerful models of customer behavior. Over the years, researchers and practitioners have used these models for a wide variety of applications, such as new product sales, forecasting, analyses of media usage, and targeted marketing programs. Other disciplines have seen equally broad utilization of these techinques. The course will be entirely lecture-based with a strong emphasis on real-time problem solving. Most sessions will feature sophisticated numerical investigations using Microsoft Excel. Much of the material is highly technical.

Prerequisites: A high comfort level with basic integral calculus and recent exposure to a formal course in probability and statistics such as STAT 430 is strongly recommended.

STAT480 - ADV STAT COMPUTING

This course will build on the fundamental concepts introduced in the prerequisite courses to allow students to acquire knowledge and programming skills in large-scale data analysis, data visualization, and stochastic simulation.

Prerequisites: STAT 470 or STAT 405 or equivalent background acquired through a combination of online courses that teach the R language and practical experience.

STAT490 - CAUSAL INFERENCE

Questions about cause are at the heart of many everyday decisions and public policies. Does eating an egg every day cause people to live longer or shorter or have no effect? Do gun control laws cause more or less murders or have no effect? Causal inference is the subfield of statistics that considers how we should make inferences about such questions. This course will cover the key concepts and methods of causal inference rigorously. The course is intended for statistics concentrators and minors.

Prerequisites: STAT 430 is a required course for this class. One of STAT 102, STAT 112 or STAT 431 is also required for this class. Knowledge of R such as that covered in STAT 405 or STAT 470.

STAT500 - APPLIED REG & ANALY VAR (Course Syllabus)

An applied graduate level course in multiple regression and analysis of variance for students who have completed an undergraduate course in basic statistical methods. Emphasis is on practical methods of data analysis and their interpretation. Covers model building, general linear hypothesis, residual analysis, leverage and influence, one-way anova, two-way anova, factorial anova. Primarily for doctoral students in the managerial, behavioral, social and health sciences.

Prerequisites: STAT 102 or 112 or equivalent

STAT501 - INT TO NONP & LOGLIN MOD (Course Syllabus)

An applied graduate level course for students who have completed an undergraduate course in basic statistical methods. Covers two unrelated topics: loglinear and logit models for discrete data and nonparametric methods for nonnormal data. Emphasis is on practical methods of data analysis and their interpretation. Primarily for doctoral students in the managerial, behavioral, social and health sciences. May be taken before STAT 500 with permission of instructor.

Prerequisites: STAT 102 or 112 or equivalent

STAT503 - DATA ANALY & STAT COMP (Course Syllabus)

This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

Prerequisites: Two courses at the statistics 400 or 500 level.

STAT510 - PROBABILITY (Course Syllabus)

Elements of matrix algebra. Discrete and continuous random variables and their distributions. Moments and moment generating functions. Joint distributions. Functions and transformations of random variables. Law of large numbers and the central limit theorem. Point estimation: sufficiency, maximum likelihood, minimum variance. Confidence intervals.

Prerequisites: A one year course in calculus

STAT511 - STATISTICAL INFERENCE (Course Syllabus)

Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course.

Prerequisites: STAT 510 or equivalent

STAT512 - MATHEMATICAL STATISTICS (Course Syllabus)

An introduction to the mathematical theory of statistics. Estimation, with a focus on properties of sufficient statistics and maximum likelihood estimators. Hypothesis testing, with a focus on likelihood ratio tests and the consequent development of "t" tests and hypothesis tests in regression and ANOVA. Nonparametric procedures.

Prerequisites: STAT 430 or 510 or equivalent

STAT515 - ADV STAT INFERENCE I

STAT 515 is aimed at first-year Ph.D. students and builds a good foundation in statistical inference from the first principles of probability.

Prerequisites: STAT 430 and STAT 431 and MATH 114 and MATH 240 or equivalent

STAT516 - ADV STAT INFERENCE II

STAT 516 is a natural continuation of STAT 515, and the main focus is on asymptotic evaluations and regression models. Time permitting, it also discusses some basic nonparametric statistical methods.

Prerequisites: STAT 515

STAT520 - APPLIED ECONOMETRICS I (Course Syllabus)

This is a course in econometrics for graduate students. The goal is to prepare students for empirical research by studying econometric methodology and its theoretical foundations. Students taking the course should be familiar with elementary statistical methodology and basic linear algebra, and should have some programming experience. Topics include conditional expectation and linear projection, asymptotic statistical theory, ordinary least squares estimation, the bootstrap and jackknife, instrumental variables and two-stage least squares, specification tests, systems of equations, generalized least squares, and introduction to use of linear panel data models.

Prerequisites: MATH 114 and MATH 312 or equivalents, and an undergraduate introduction to probability and statistics

STAT521 - APPLIED ECONOMETRICS II (Course Syllabus)

Topics include system estimation with instrumental variables, fixed effects and random effects estimation, M-estimation, nonlinear regression, quantile regression, maximum likelihood estimation, generalized method of moments estimation, minimum distance estimation, and binary and multinomial response models. Both theory and applications will be stressed.

Prerequisites: STAT 520. This is a continuation of STAT 520

STAT533 - STOCHASTIC PROCESSES (Course Syllabus)

An introduction to Stochastic Processes. The primary focus is on Markov Chains, Martingales and Gaussian Processes. We will discuss many interesting applications from physics to economics. Topics may include: simulations of path functions, game theory and linear programming, stochastic optimization, Brownian Motion and Black-Scholes.

Prerequisites: STAT 510 or equivalent

STAT542 - BAYESIAN METH & COMP (Course Syllabus)

Sophisticated tools for probability modeling and data analysis from the Bayesian perspective. Hierarchical models, mixture models and Monte Carlo simulation techniques.

Prerequisites: STAT 430 or 510 or equivalent or permission of instructor

STAT571 - MODERN DATA MINING (Course Syllabus)

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

Prerequisites: Two courses at the statistics 400 or 500 level or permission from instructor

STAT580 - ADV STAT COMPUTING

This course will build on the fundamental concepts introduced in the prerequisite courses to allow students to acquire knowledge and programming skills in large-scale data analysis, data visualization, and stochastic simulation.

Prerequisites: STAT 503 or equivalent background acquired through a combination of online courses that teach the R language and practical experience.

STAT590 - CAUSAL INFERENCE

Questions about cause are at the heart of many everyday decisions and public policies. Does eating an egg every day cause people to live longer or shorter or have no effect? Do gun control laws cause more or less murders or have no effect? Causal inference is the subfield of statistics that considers how we should make inferences about such questions. This course will cover the key concepts and methods of causal inference rigorously.

Prerequisites: Background in probability and statistics; some knowledge of R.