Andreas Buja

Andreas Buja
  • The Liem Sioe Liong/First Pacific Company Professor
  • Professor of Statistics

Contact Information

  • office Address:

    471 Jon M. Huntsman Hall
    3730 Walnut Street
    Philadelphia, PA 19104

Research Interests: data visualization, multivariate statistics, nonparametric statistics

Links: CV

Overview

Education

PhD, Swiss Federal Institute of Technology (ETHZ), 1980

 

Academic Positions Held

Wharton: 2002-present (name Liem Sioe Liong/ First Pacific Company Professor, 2003).
Previous appointment: University of Washington, Seattle. Visiting appointment: Stanford University

Other Positions

Member, Technical Staff, Bellcore/Telcordia, 1987-94
Member, Technical Staff, AT&T Bell Labs, 1994-96
Technology Consultant, AT&T Labs, 1996-2001

Professional Leadership

Editor, Journal of Computational and Graphical Statistics, 1997-2001
Advisory Editor, Journal of Computational and Graphical Statistics, 2001-present

For more information, go to My Personal Page

Continue Reading

Research

Teaching

Current Courses

  • STAT470 - Data Analytics And Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

    STAT470401 ( Syllabus )

    STAT470402 ( Syllabus )

  • STAT503 - Data Analytics And Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

    STAT503401 ( Syllabus )

    STAT503402 ( Syllabus )

  • STAT770 - Data Analytics And Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

    STAT770401 ( Syllabus )

    STAT770402 ( Syllabus )

  • STAT961 - Statistical Methodology

    This is a course that prepares 1st year PhD students in statistics for a research career. This is not an applied statistics course. Topics covered include: linear models and their high-dimensional geometry, statistical inference illustrated with linear models, diagnostics for linear models, bootstrap and permutation inference, principal component analysis, smoothing and cross-validation.

    STAT961001

Past Courses

  • STAT101 - Introductory Business Statistics

    Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college.

  • STAT102 - Introductory Business Statistics

    Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

  • STAT470 - Data Analytics and Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

  • STAT503 - Data Analytics and Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

  • STAT621 - Accelerated Regression Analysis for Business

    STAT 621 is intended for students with recent, practical knowledge of the use of regression analysis in the context of business applications. This course covers the material of STAT 613, but omits the foundations to focus on regression modeling. The course reviews statistical hypothesis testing and confidence intervals for the sake of standardizing terminology and introducing software, and then moves into regression modeling. The pace presumes recent exposure to both the theory and practice of regression and will not be accommodating to students who have not seen or used these methods previously. The interpretation of regression models within the context of applications will be stressed, presuming knowledge of the underlying assumptions and derivations. The scope of regression modeling that is covered includes multiple regression analysis with categorical effects, regression diagnostic procedures, interactions, and time series structure. The presentation of the course relies on computer software that will be introduced in the initial lectures.

  • STAT770 - Data Analytics and Statistical Computing

    This course will introduce a high-level programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

  • STAT926 - Multivariate Analysis: Methodology

    This is a course that prepares PhD students in statistics for research in multivariate statistics and data visualization. The emphasis will be on a deep conceptual understanding of multivariate methods to the point where students will propose variations and extensions to existing methods or whole new approaches to problems previously solved by classical methods. Topics include: principal component analysis, canonical correlation analysis, generalized canonical analysis; nonlinear extensions of multivariate methods based on optimal transformations of quantitative variables and optimal scaling of categorical variables; shrinkage- and sparsity-based extensions to classical methods; clustering methods of the k-means and hierarchical varieties; multidimensional scaling, graph drawing, and manifold estimation.

  • STAT961 - Statistical Methodology

    This is a course that prepares 1st year PhD students in statistics for a research career. This is not an applied statistics course. Topics covered include: linear models and their high-dimensional geometry, statistical inference illustrated with linear models, diagnostics for linear models, bootstrap and permutation inference, principal component analysis, smoothing and cross-validation.

  • STAT995 - Dissertation

  • STAT999 - Independent Study

Awards and Honors

  • Keynote speaker, Classification Society Conference, Milwaukee, WI, USA, 2013
  • Infovis best paper award for the article “Graphical inference for infovis” by Wickham, H., Cook, D., Hofmann, H., and Buja, A. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis’10)., 2010
  • Journal of Marketing, finalist for the Harold H. Maynard Award and featured blog article of the October Issue, 2007
  • Keynote speaker, SIAM Conference on Datamining (SDM06), Bethesda, MD, USA, 2006
  • Fellow, Institute of Mathematical Statistics, 2006
  • IMS Medallion lecture, Joint Statistical Meetings, New York, 2002
  • Keynote speaker, European Meeting of the Psychometric Society, Leiden, 1995
  • Fellow, American Statistical Association, 1994
  • Award Medal for diploma thesis in mathematics, Swiss Federal Institute of Technology, 1975

In the News

Knowledge @ Wharton

Activity

Latest Research

Andreas Buja and Wolfgang Rolke (Work In Progress), Calibration for Simultaneity: (Re)sampling Methods for Simultaneous Inference with Applications to Function Estimation and Functional Data.
All Research

In the News

Different Worlds: Do Recommender Systems Fragment Consumers’ Interests?

The rise of computer-driven recommendation systems designed to help consumers navigate a growing ocean of choice is prompting concerns that the hyperpersonalization of information sources will lead to harmful divisions throughout society. A new study on consumer purchasing patterns in the music industry suggests the opposite. The paper, by Wharton researchers Kartik Hosanagar, Andreas Buja and Daniel M. Fleder, is titled, "Will the Global Village Fracture into Tribes: Recommender Systems and their Effects on the Consumer."

Knowledge @ Wharton - 2011/08/31
All News

Awards and Honors

Keynote speaker, Classification Society Conference, Milwaukee, WI, USA 2013
All Awards