Ying Jin

Ying Jin
  • Assistant Professor of Statistics and Data Science

Contact Information

  • office Address:

    441 Academic Research Building
    265 South 37th Street
    Philadelphia, PA 19104

Research Interests: Uncertainty quantification, Distribution-free inference, Causal inference, Selective inference, Generalizability.

Links: Personal Website

Research

Teaching

Current Courses (Fall 2025)

  • STAT4710 - Modern Data Mining

    With the advent of the internet age, data are being collected at unprecedented scale in almost all realms of life, including business, science, politics, and healthcare. Data mining�the automated extraction of actionable insights from data�has revolutionized each of these realms in the 21st century. The objective of the course is to teach students the core data mining skills of exploratory data analysis, selecting an appropriate statistical methodology, applying the methodology to the data, and interpreting the results. The course will cover a variety of data mining methods including linear and logistic regression, penalized regression (including lasso and ridge regression), tree-based methods (including random forests and boosting), and deep learning. Students will learn the conceptual basis of these methods as well as how to apply them to real data using the programming language R. This course may be taken concurrently with the prerequisite with instructor permission.

    STAT4710401 ( Syllabus )

    STAT4710402 ( Syllabus )

  • STAT5710 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 4000 or 5000 level or permission from instructor.

    STAT5710401 ( Syllabus )

    STAT5710402 ( Syllabus )

All Courses

  • AMCS9999 - Ind Study & Research

    Study under the direction of a faculty member.

  • STAT4710 - Modern Data Mining

    With the advent of the internet age, data are being collected at unprecedented scale in almost all realms of life, including business, science, politics, and healthcare. Data mining�the automated extraction of actionable insights from data�has revolutionized each of these realms in the 21st century. The objective of the course is to teach students the core data mining skills of exploratory data analysis, selecting an appropriate statistical methodology, applying the methodology to the data, and interpreting the results. The course will cover a variety of data mining methods including linear and logistic regression, penalized regression (including lasso and ridge regression), tree-based methods (including random forests and boosting), and deep learning. Students will learn the conceptual basis of these methods as well as how to apply them to real data using the programming language R. This course may be taken concurrently with the prerequisite with instructor permission.

  • STAT5710 - Modern Data Mining

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 4000 or 5000 level or permission from instructor.

Awards and Honors

IMS Lawrence D. Brown PhD Student Award, Institute of Mathematical Statistics, 2025

Jack Youden Prize, American Society for Quality’s Chemical and Process Industries Division, 2024

Ingram Olkin Interdisciplinary Dissertation Award, Department of Statistics, Stanford University, 2024

Rising Star in Data Science, University of Chicago, 2023

Student Paper Award, ICSA Applied Statistics Symposium, 2022

Tom Ten Have Award Runner up, American Causal Inference Conference, 2022

    Activity

    Latest Research

    Kexin Huang, Ying Jin, Ryan Li, Michael Li, Emmanuel J. Candes, Jure Leskovec (2025), Automated hypothesis validation with agentic sequential falsifications, International Conference on Machine Learning (ICML).
    All Research