Linda Zhao

Linda Zhao
  • Professor of Statistics

Contact Information

  • office Address:

    470 Jon M. Huntsman Hall
    3730 Walnut Street
    Philadelphia, PA 19104

Overview

After getting her Ph.D in Mathematics/Statistics from Cornell University , Linda taught in UCLA, Los Angeles for one year. She joined the Wharton School in 1994. She obtained a BS degree from the Mathematics department of Nankai University, China.

Linda’s research area covers from Beysian analysis, Nonparametric analysis and Numerical computation. She mainly publishes in international leading journals. Current on going projects include forecasting house prices, inference for high dimensional data, data with measurement errors and post model selection inferences. Linda also enjoys teaching very much.

Selected Publications

Zhao, L. H. (2000) Bayesian aspects of some nonparametric problems, The Annals of Statistics, 28, 532–552

Mao, V. and Zhao, L. H. (2003) Free knot polynomial splines with confidence intervals, Journal of the Royal Statistical Society, Series B, 65, 901-919

Brown, L. D., Wang, Y. and Zhao, L. H. (2003) On the statistical equivalence at suitable frequencies of GARCH and stochastic volatility models with the corresponding diffusion model, Statistica Sinica, 993-1013

Brown, L. D., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. H. (2005) Statistical analysis of a telephone call center: A queueing-science perspective, Journal of the American Statistical Association, 100, 36-50

Cai, T., Low, M. and Zhao, L.H. (2007) Trade-offs between global and local risks in nonparametric function estimation, Bernoulli, 13, 1-19

Berk, R., Brown, L.B. and Zhao, L. (2010) Statistical inference after model selection, Journal of Quantitative Criminology, 26, 217-236

Raykar, V., Yu, S., Zhao, L., .Valadez, G., Florin, C., Bogoni, L. and Moy, L. (2010) Learning from crowds, Journal of Machine Learning Research, 11, 1297–1322

Brown, L. D., Cai, T., Zhang, R., Zhao, L. H. and Zhou, H. (2010) The root-unroot algorithm for density estimation as implemented via wavelet block thresholding, Probability Theory and Related Field, 146, 401-433

Raykar, V. and Zhao, L. (2010) Nonparametric prior for adaptive sparsity, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: 629-636

Nagaraja, C. H., Brown, L.D. and Zhao, L. (2010) An autoregressive approach to house price modeling, to appear The Annals of Applied Statistics

Continue Reading

Research

Teaching

Current Courses

  • STAT471 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

    STAT471401

    STAT471402

  • STAT571 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

    STAT571401

    STAT571402

  • STAT701 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

    STAT701401

    STAT701402

Past Courses

  • STAT101 - INTRO BUSINESS STAT

    Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college.

  • STAT102 - INTRO BUSINESS STAT

    Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

  • STAT111 - INTRODUCTORY STATISTICS

    Introduction to concepts in probability. Basic statistical inference procedures of estimation, confidence intervals and hypothesis testing directed towards applications in science and medicine. The use of the JMP statistical package.

  • STAT112 - INTRODUCTORY STATISTICS

    Further development of the material in STAT 111, in particular the analysis of variance, multiple regression, non-parametric procedures and the analysis of categorical data. Data analysis via statistical packages.

  • STAT431 - STATISTICAL INFERENCE

    Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 101 and 102.

  • STAT471 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT511 - STATISTICAL INFERENCE

    Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course.

  • STAT520 - APPLIED ECONOMETRICS I

    This is a course in econometrics for graduate students. The goal is to prepare students for empirical research by studying econometric methodology and its theoretical foundations. Students taking the course should be familiar with elementary statistical methodology and basic linear algebra, and should have some programming experience. Topics include conditional expectation and linear projection, asymptotic statistical theory, ordinary least squares estimation, the bootstrap and jackknife, instrumental variables and two-stage least squares, specification tests, systems of equations, generalized least squares, and introduction to use of linear panel data models.

  • STAT571 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT701 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT899 - INDEPENDENT STUDY

  • STAT927 - BAYESIAN STATISTICS

    This graduate course will cover the modeling and computation required to perform advanced data analysis from the Bayesian perspective. We will cover fundamental topics in Bayesian probability modeling and implementation, including recent advances in both optimization and simulation-based estimation strategies. Key topics covered in the course include hierarchical and mixture models, Markov Chain Monte Carlo, hidden Markov and dynamic linear models, tree models, Gaussian processes and nonparametric Bayesian strategies.

  • STAT995 - DISSERTATION

  • STAT999 - INDEPENDENT STUDY