Richard Paul Waterman

Richard Paul Waterman
  • Practice Professor of Statistics and Data Science

Contact Information

  • office Address:

    315 Academic Research Building
    265 South 37th Street
    Philadelphia, PA 19104

Research Interests: categorical data analysis, environmental statistics, likelihood methods

Links: CV

Teaching

Current Courses (Spring 2023)

  • STAT7100 - Data Collect & Acquisit

    This course will give students a solid grasp of different data collection strategies and when and how they can be applied in practice. At the same time, important current ideas such as data confidentiality and ethical considerations will be addressed. The students will have designed and fielded a sample survey and designed and fielded an online experiment (A/B test). Student will collect data through web scraping activities and/or using an API. Students will summarize their collected data and subsequent inferences, culminating with an in-class presentation. The course is structured in two parts. The first part is a "Strategies" component that addresses different data collection strategies. It will discuss sample designs, experimentation, and observational studies. The second part of the course is about "Platforms" and goes into the practicalities of the implementation of the different strategies. Given the data science perspective of this course, this is focused on web enabled approaches. Familiarity with either R or Python is expected and specifically the R-Studio or Jupyter notebooks platforms. Courses such as Stat 7050 or Stat 7770 would meet this requirement. Statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as Stat 6130/6210, or by waiving MBA statistics.

    STAT7100002

  • STAT7770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as STAT 6130/6210; or by waiving MBA statistics.

    STAT7770401

  • OIDD7770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as STAT 6130/6210; or by waiving MBA statistics.

    OIDD7770401

  • STAT4100 - Data Collect & Acquisit

    This course will give students a solid grasp of different data collection strategies and when and how they can be applied in practice. At the same time, important current ideas such as data confidentiality and ethical considerations will be addressed. The students will have designed and fielded a sample survey and designed and fielded an online experiment (A/B test). Student will collect data through web scraping activities and/or using an API. Students will summarize their collected data and subsequent inferences, culminating with an in-class presentation. The course is structured in two parts. The first part is a "Strategies" component that addresses different data collection strategies. It will discuss sample designs, experimentation, and observational studies. The second part of the course is about "Platforms" and goes into the practicalities of the implementation of the different strategies. Given the data science perspective of this course, this is focused on web enabled approaches. Familiarity with either R or Python is expected and specifically the R-Studio or Jupyter notebooks platforms. Courses such as Stat 4050 or Stat 4770 would meet this requirement. Statistics, through the level of multiple regression is required. This requirement may be fulfilled with undergraduate courses such as Stat 1020 or Stat 1120.

    STAT4100002

  • STAT4020 - Communicating Quant. Analyses

    This seminar-based capstone course provides an opportunity for students to hone their data science and statistical modeling skills, together with an emphasis on communicating quantitative results. This is not a “theoretical class”, but rather, experiential. It allows students to bring their existing knowledge from different disciplines to bear on new problems. Four real-life datasets will be analyzed during the quarter, and students will be expected to create and deliver in-class presentations for each analysis. The course will be suitable for anyone who wants more opportunities to analyze data, continue developing their programming skills and those who want to gain experience and confidence in presenting results and conclusions to an audience. Prerequisites: The course presumes that students have taken a sequence of stat courses such as STAT 1010/1020, or 4300/4310 and so are familiar with multiple regression analysis. In addition, they should have been exposed to more advanced techniques such as logistic regression and tree-based methods as taught in classes like STAT 4220/4230/4710. Finally, it will be assumed that students have knowledge of a programming language such as R or Python and an IDE such as R-Studio or Jupyter notebooks. Classes such as STAT 4050/4700 would meet this requirement.

    STAT4020001

All Courses

  • ECON4999 - Independent Study

    Individual study and research under the direction of a member of the Economics Department faculty. At a minimum, the student must write a major paper summarizing, unifying, and interpreting the results of the study. This is a one semester, one c.u. course.

  • OIDD4770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with Undergraduate courses such as Stat 1020, Stat 1120.

  • OIDD7770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as STAT 6130/6210; or by waiving MBA statistics.

  • STAT1010 - Intro Business Stat

    Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college. This course may be taken concurrently with the prerequisite with instructor permission.

  • STAT1020 - Intro Business Stat

    Continuation of STAT 1010 or STAT 1018. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications. This course may be taken concurrently with the prerequisite with instructor permission.

  • STAT3990 - Independent Study

    Written permission of instructor and the department course coordinator required to enroll in this course.

  • STAT4020 - Communicating Quant. Analyses

    This seminar-based capstone course provides an opportunity for students to hone their data science and statistical modeling skills, together with an emphasis on communicating quantitative results. This is not a “theoretical class”, but rather, experiential. It allows students to bring their existing knowledge from different disciplines to bear on new problems. Four real-life datasets will be analyzed during the quarter, and students will be expected to create and deliver in-class presentations for each analysis. The course will be suitable for anyone who wants more opportunities to analyze data, continue developing their programming skills and those who want to gain experience and confidence in presenting results and conclusions to an audience. Prerequisites: The course presumes that students have taken a sequence of stat courses such as STAT 1010/1020, or 4300/4310 and so are familiar with multiple regression analysis. In addition, they should have been exposed to more advanced techniques such as logistic regression and tree-based methods as taught in classes like STAT 4220/4230/4710. Finally, it will be assumed that students have knowledge of a programming language such as R or Python and an IDE such as R-Studio or Jupyter notebooks. Classes such as STAT 4050/4700 would meet this requirement.

  • STAT4050 - Stat Computing with R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT4100 - Data Collect & Acquisit

    This course will give students a solid grasp of different data collection strategies and when and how they can be applied in practice. At the same time, important current ideas such as data confidentiality and ethical considerations will be addressed. The students will have designed and fielded a sample survey and designed and fielded an online experiment (A/B test). Student will collect data through web scraping activities and/or using an API. Students will summarize their collected data and subsequent inferences, culminating with an in-class presentation. The course is structured in two parts. The first part is a "Strategies" component that addresses different data collection strategies. It will discuss sample designs, experimentation, and observational studies. The second part of the course is about "Platforms" and goes into the practicalities of the implementation of the different strategies. Given the data science perspective of this course, this is focused on web enabled approaches. Familiarity with either R or Python is expected and specifically the R-Studio or Jupyter notebooks platforms. Courses such as Stat 4050 or Stat 4770 would meet this requirement. Statistics, through the level of multiple regression is required. This requirement may be fulfilled with undergraduate courses such as Stat 1020 or Stat 1120.

  • STAT4220 - Predictive Analytics

    This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course may be taken concurrently with the prerequisite with instructor permission.

  • STAT4770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with Undergraduate courses such as Stat 1020, Stat 1120.

  • STAT6130 - Regr Analysis For Bus

    This course provides the fundamental methods of statistical analysis, the art and science if extracting information from data. The course will begin with a focus on the basic elements of exploratory data analysis, probability theory and statistical inference. With this as a foundation, it will proceed to explore the use of the key statistical methodology known as regression analysis for solving business problems, such as the prediction of future sales and the response of the market to price changes. The use of regression diagnostics and various graphical displays supplement the basic numerical summaries and provides insight into the validity of the models. Specific important topics covered include least squares estimation, residuals and outliers, tests and confidence intervals, correlation and autocorrelation, collinearity, and randomization. The presentation relies upon computer software for most of the needed calculations, and the resulting style focuses on construction of models, interpretation of results, and critical evaluation of assumptions.

  • STAT6210 - Acc Regression Analysis

    STAT 6210 is intended for students with recent, practical knowledge of the use of regression analysis in the context of business applications. This course covers the material of STAT 6130, but omits the foundations to focus on regression modeling. The course reviews statistical hypothesis testing and confidence intervals for the sake of standardizing terminology and introducing software, and then moves into regression modeling. The pace presumes recent exposure to both the theory and practice of regression and will not be accommodating to students who have not seen or used these methods previously. The interpretation of regression models within the context of applications will be stressed, presuming knowledge of the underlying assumptions and derivations. The scope of regression modeling that is covered includes multiple regression analysis with categorical effects, regression diagnostic procedures, interactions, and time series structure. The presentation of the course relies on computer software that will be introduced in the initial lectures. Recent exposure to the theory and practice of regression modeling is recommended.

  • STAT7010 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 4000 or 5000 level or permission from instructor.

  • STAT7050 - Stat Computing with R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT7100 - Data Collect & Acquisit

    This course will give students a solid grasp of different data collection strategies and when and how they can be applied in practice. At the same time, important current ideas such as data confidentiality and ethical considerations will be addressed. The students will have designed and fielded a sample survey and designed and fielded an online experiment (A/B test). Student will collect data through web scraping activities and/or using an API. Students will summarize their collected data and subsequent inferences, culminating with an in-class presentation. The course is structured in two parts. The first part is a "Strategies" component that addresses different data collection strategies. It will discuss sample designs, experimentation, and observational studies. The second part of the course is about "Platforms" and goes into the practicalities of the implementation of the different strategies. Given the data science perspective of this course, this is focused on web enabled approaches. Familiarity with either R or Python is expected and specifically the R-Studio or Jupyter notebooks platforms. Courses such as Stat 7050 or Stat 7770 would meet this requirement. Statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as Stat 6130/6210, or by waiving MBA statistics.

  • STAT7220 - Predictive Analytics

    This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course is formerly STAT 6220.

  • STAT7770 - Intro To Python Data Sci

    The goal of this course is to introduce the Python programming language within the context of the closely related areas of statistics and data science. Students will develop a solid grasp of Python programming basics, as they are exposed to the entire data science workflow, starting from interacting with SQL databases to query and retrieve data, through data wrangling, reshaping, summarizing, analyzing and ultimately reporting their results. Competency in Python is a critical skill for students interested in data science. Prerequisites: No prior programming experience is expected, but statistics, through the level of multiple regression is required. This requirement may be fulfilled with MBA courses such as STAT 6130/6210; or by waiving MBA statistics.

  • STAT8990 - Independent Study

    Written permission of instructor, the department MBA advisor and course coordinator required to enroll.

Knowledge at Wharton

How Jazz Can Unlock Your Team’s Next Breakthrough

“Generative conversations,” in which multiple perspectives are integrated to kindle new solutions, are a powerful way to address the complex challenges facing organizations. Experts from Wharton and SEB explain the neuroscience behind why they work. Read More

Knowledge @ Wharton - 1/27/2023
Why Older Americans Regret Not Saving Early and Enough

Financial literacy and greater longevity awareness will help fix retirement planning gaps, according to a study that delved into multiple dimensions of undersaving.Read More

Knowledge @ Wharton - 1/23/2023
How Sunk Costs Affect Firms’ Investment Decisions

Research by Wharton’s Marius Guenzel provides evidence that companies systematically fail to ignore “sunk costs” in losing ventures, which leads to significant investment distortions.Read More

Knowledge @ Wharton - 1/23/2023