Research Interests: credit scoring, model selection, pattern recognition and classification, statistical computing and graphics, time series analysis and forecasting
PhD, Princeton University, 1982
MA, Princeton University, 1979
BS, University of South Carolina, 1977
Fraud detection in loan applications; validating models in use for consumer credit default.
Miller-Sherrerd MBA Core Teaching Award, 2003, 2006, 2007, 2010
David W. Hauck Award for Outstanding Teaching, 2001
Excellence in Teaching Award (Undergraduate Division), 2001, 2004
Wharton: 1979-present (Research Associate, Analysis Center for Evaluation of Energy Modeling and Statistics, 1979-83, Director of Computing Analysis Center, 1979-83).
Previous appointments: Princeton University; University of Michigan; University of South Carolina. Visiting appointment: University of Michigan
Summer Intern, Office of Energy Information Administration, U.S. Department of Energy, 1978
For more information, go to My Personal Page
Sivan Aldor-Noiman, Lawrence D. Brown, Emily Fox, Robert A. Stine (2017), Spatio-temporal Low Count Processes with Application to Violent Crime Events, Statistica Sinica, 26 (), pp. 1587-1610.
Robert A. Stine (2017), Explaining Normal Quantile-Quantile Plots Through Animation: The Water-Filling Analogy, The American Statistician, 71 (2), pp. 145-147.
Robert A. Stine and Dean P. Foster, Statistics for Business: Decision Making and Analysis (Boston: Pearson, 2017)
Kory Johnson, Dean P. Foster, Robert A. Stine (Working), Impartial Predictive Modeling: Ensuring Fairness in Arbitrary Models.
Kory Johnson, Robert A. Stine, Dean P. Foster (Working), Submodularity in statistics: Comparing the success of model selection methods.
Kory Johnson, Dean P. Foster, Robert A. Stine (Working), Revisiting alpha investing: conditionally valid stepwise regression.
Dean P. Foster and Robert A. Stine (2015), Risk Inflation of Sequential Tests Controlled by Alpha Investing, Journal of Statistical Computation and Simulation, 85 (), pp. 3613-3627.
Dean P. Foster, Mark Liberman, Robert A. Stine (Working), Featurizing text: Converting text into predictors for regression analysis.
Dean P. Foster and Robert A. Stine (2014), Risk Inflation of Sequential Tests Controlled by Alpha Investing, Journal of Statistical Computation and Simulation.
Sivan Aldor-Noiman, Lawrence D. Brown, Andreas Buja, Wolfgang Rolke, Robert A. Stine (2013), The Power to See: A New Graphical Test of Normality, The American Statistician , 67 (4), pp. 249-260.
Abstract: Many statistical procedures assume that the underlying data-generating process involves Gaussian errors. Among the popular tests for normality, only the Kolmogorov–Smirnov test has a graphical representation. Alternative tests, such as the Shapiro–Wilk test, offer little insight as to how the observed data deviate from normality. In this article, we discuss a simple new graphical procedure which provides simultaneous confidence bands for a normal quantile–quantile plot. These bands define a test of normality and are narrower in the tails than those related to the Kolmogorov–Smirnov test. Correspondingly, the new procedure has greater power to detect deviations from normality in the tails. Supplementary materials for this article are available online.
This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course may be taken concurrently with the prerequisite with instructor permission.
STAT4220001 ( Syllabus )
This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course is formerly STAT 6220.
STAT7220001 ( Syllabus )
Continuation of STAT 1010 or STAT 1018. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications. This course may be taken concurrently with the prerequisite with instructor permission.
Written permission of instructor and the department course coordinator required to enroll in this course.
The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.
This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course may be taken concurrently with the prerequisite with instructor permission.
This course provides an introduction to the wide range of techniques available for statistical modelling and forecasting of time series. Regression methods for decomposition models, trends and seasonality, spectral analysis, distributed lag models, autoregressive-moving average modeling, forecasting, exponential smoothing, and ARCH and GARCH models will be surveyed. The emphasis will be on applications, rather than technical foundations and derivations. The techniques will be studied critically, with examination of their usefulness and limitations.
The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.
This course provides an introduction to the wide range of techniques available for statistical modelling and forecasting of time series. Regression methods for decomposition models, trends and seasonality, spectral analysis, distributed lag models, autoregressive-moving average modeling, forecasting, exponential smoothing, and ARCH and GARCH models will be surveyed. The emphasis will be on applications, rather than technical foundations and derivations. The techniques will be studied critically, with examination of their usefulness and limitations. This course may be taken concurrently with the prerequisite with instructor permission.
This course follows from the introductory regression classes, STAT 1020, STAT 1120, and STAT 4310 for undergraduates and STAT 6130 for MBAs. It extends the ideas from regression modeling, focusing on the core business task of predictive analytics as applied to realistic business related data sets. In particular it introduces automated model selection tools, such as stepwise regression and various current model selection criteria such as AIC and BIC. It delves into classification methodologies such as logistic regression. It also introduces classification and regression trees (CART) and the popular predictive methodologies known as random forest and boosted trees. By the end of the course the student will be familiar with and have applied these concepts and will be ready to use them in a work setting. The methodologies are implemented in a variety of software packages. Applications in JMP emphasize concepts and key modeling decisions. This course is formerly STAT 6220.
This course introduces modern text analytics, and the tools of natural language processing. Text and language are powerful repositories of knowledge and information, but the semi-structured nature of language makes deriving insights from text challenging. Modern analytic techniques introduced in this course make it significantly easier even for non-specialists to use text and language data to drive deep insights. The course will use several examples from real world applications in different industries such as ecommerce, healthcare and finance to illustrate these techniques. Students should be familiar with regression models at the level of Stat 6130 or Stat 1020, and the Python language at the level of Stat 4770 or Stat 7770. Familiarity with the Jupyter notebook development environment is presumed, as well as common Python packages such as pandas, NLTK and SpaCy. Those with more knowledge of Statistics, such as from Stat 7220/4220, or computing skills will benefit. The predominant software used in the course is Jupyter notebooks that use a Python interpreter. Familiarity with basic probability models is helpful but not presumed.
Written permission of instructor, the department MBA advisor and course coordinator required to enroll.
Dissertation
The harsh economic downturn that has chastened credit-happy consumers, along with increased scrutiny by regulators, will force card issuers to rethink their business models as the economy begins to recover, according to Wharton faculty and credit industry analysts.…Read More
Knowledge at Wharton - 7/8/2009Sivan Aldor-Noiman, GR’12, came to Wharton Doctoral Programs from a small statistics program in Israel, and found herself in a dynamic center of intellectual life….
Wharton Stories - 11/13/2014