Linda Zhao

Linda Zhao
  • Professor of Statistics and Data Science
  • Academic Director of the Dual Master's Degree in Statistics

Contact Information

  • office Address:

    403 Academic Research Building
    265 South 37th Street
    Philadelphia, PA 19104

Research Interests: Statistical machine learning, data-driven decision-making, crowdsourcing, post-selection inference, network analysis, nonparametric Bayes, equity ownership, education in data science

Links: Personal Website

Overview

After getting her Ph.D in Mathematics/Statistics from Cornell University, Linda taught in UCLA, Los Angeles for one year. She joined the Wharton School in 1994. She obtained a BS degree from the Mathematics department of Nankai University, China.

Linda’s research area covers statistical machine learning, data-driven decision-making, crowdsourcing, post-selection inference, network analysis, nonparametric Bayes, equity ownership, education in data science.  Current on going projects include equity network, inference for high dimensional data, data with measurement errors and post model selection inferences. Linda also enjoys teaching very much.

Selected Publications

Zhao, L. H. (2000) Bayesian aspects of some nonparametric problems, The Annals of Statistics, 28, 532–552

Brown, L. D., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. H. (2005) Statistical analysis of a telephone call center: A queueing-science perspective, Journal of the American Statistical Association, 100, 36-50

Cai, T., Low, M. and Zhao, L.H. (2007) Trade-offs between global and local risks in nonparametric function estimation, Bernoulli, 13, 1-19

Berk, R., Brown, L.B. and Zhao, L. (2010) Statistical inference after model selection, Journal of Quantitative Criminology, 26, 217-236

Raykar, V., Yu, S., Zhao, L., .Valadez, G., Florin, C., Bogoni, L. and Moy, L. (2010) Learning from crowds, Journal of Machine Learning Research, 11, 1297–1322

Brown, L. D., Cai, T., Zhang, R., Zhao, L. H. and Zhou, H. (2010) The root-unroot algorithm for density estimation as implemented via wavelet block thresholding, Probability Theory and Related Field, 146, 401-433

Raykar, V. and Zhao, L. (2010) Nonparametric prior for adaptive sparsity, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: 629-636

Nagaraja, C. H., Brown, L.D. and Zhao, L. (2010) An autoregressive approach to house price modeling,  The Annals of Applied Statistics, 5, 124-149.

Berk, R., Brown,  L.D., Buja,  A., Zhang, K. and  Zhao, L. H. (2013) Valid post-selection inference, The Annals of Statistics, 41, 802-837

Harrison, A., Meyer, M., Wang, P., Zhao, L.  and Zhao, M. (2018) Can a Tiger Change Its Stripes? Reform of Chinese State-Owned Enterprises in the Penumbra of the State, an Vox article

Buja, A., Brown, L.D., Berk, R., George, E., Pitkin, E., Traskin, M., Zhang, K., Zhao,L. (2019). Models as Approximations I: Consequences Illustrated with Linear Regression, Statistical Science, 34 (4), 523-544.

Buja, A., Brown, L.D., Berk, R., Kuchibhotla, A., George, E., and Zhao, L.,   (2019). Models as Approximations II: A Model-Free Theory of Parametric Regression, Statistical Science, 34(4),  545-565.

Buja, A., Kuchibhotla, A., Berk, R., Tchetgen Tchetgen, E.,  George, E., and Zhao, L., (2019). Models as Approximations – Rejoinder, Statistical Science, 4, 606 – 620.

Cai, J. , Mandelbaum, A., Nagaraja, C., Shen, H. and Zhao, L. (2019) Statistical Theory Powering Data Science,  Statistical Science, 669-691

Kuchibhotla, A., Buja, A., Brown, L.D.,  Cai, J., George, E., and Zhao, L., (2019) Valid Post-selection Inference in Model-free Linear Regression, Annals of Statistics, 48(5), 2953–2981.

Azriel, D., Brown, L., Sklar, M.,  Berk, R.,  Buja, A.  and Zhao, L. (2021) Semi-Supervised linear regression, Journal of the American Statistical Association

Continue Reading

Research

Teaching

Past Courses

  • FNCE899 - INDEPENDENT STUDY

    Independent Study Projects require extensive independent work and a considerable amount of writing. ISP in Finance are intended to give students the opportunity to study a particular topic in Finance in greater depth than is covered in the curriculum. The application for ISP's should outline a plan of study that requires at least as much work as a typical course in the Finance Department that meets twice a week. Applications for FNCE 899 ISP's will not be accepted after the THIRD WEEK OF THE SEMESTER. ISP's must be supervised by a Standing Faculty member of the Finance Department.

  • STAT102 - INTRO BUSINESS STAT

    Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

  • STAT112 - INTRODUCTORY STATISTICS

    Further development of the material in STAT 111, in particular the analysis of variance, multiple regression, non-parametric procedures and the analysis of categorical data. Data analysis via statistical packages.

  • STAT399 - INDEPENDENT STUDY

    Written permission of instructor and the department course coordinator required to enroll in this course.

  • STAT471 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT571 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 400 or 500 level or permission from instructor.

  • STAT701 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 400 or 500 level or permission from instructor.

  • STAT899 - INDEPENDENT STUDY

    Written permission of instructor, the department MBA advisor and course coordinator required to enroll.

  • STAT991 - SEM IN ADV APPL OF STAT

    This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.

  • STAT995 - DISSERTATION

  • STAT9950 - Dissertation

  • STAT999 - INDEPENDENT STUDY

    Written permission of instructor and the department course coordinator required to enroll.

Awards and Honors

Activity

Wharton Magazine

Data: Voting Wait Times, Pre-IPO Confidentiality, and More
Wharton Magazine - 10/16/2020

Wharton Stories

Three women presenting on stage at a glass podium with a Wharton banner in the backgroundPredicting Random Forest Fires in California

On February 14, Analytics at Wharton, Wharton Customer Analytics, Penn Engineering, and Wharton Statistics collaborated to host the first Women in Data Science Conference (WiDS) at Penn. Among the impressive roster of PhD students, industry professionals, and professors that presented on a variety of topics were three Wharton undergrads who…

Wharton Stories - 03/06/2020
All Stories