Nancy R. Zhang

Nancy R. Zhang
  • Professor of Statistics

Contact Information

  • office Address:

    456 Jon M. Huntsman Hall
    3730 Walnut Street
    Philadelphia, PA 19104

Research Interests: genomics, change-point methods, empirical bayes estimation, model and variable selection, scan statistics, statistical modeling

Links: CV

Overview

Dr. Zhang is Professor of Statistics in The Wharton School at University of Pennsylvania.  Her current research focuses primarily on the development of statistical and computational approaches for the analysis of genetic, genomic, and transcriptomic data.  In the field of Genomics, she has developed methods to improve the accuracy of copy number variant and structural variant detection, methods for improved FDR control, and methods for analysis of single-cell RNA sequencing data.  In the field of Statistics, she has developed new models and methods for change-point analysis, variable selection, and model selection.  Dr. Zhang has also made contributions in the area of tumor genomics, where she has developed analysis methods to improve our understanding of intra-tumor clonal heterogeneity.

Here are some of Dr. Zhang’s representative publications, categorized by topic (ǂalphabetical ordering, *corresponding author):

  • Change-point detection and scan statistics
    1. Zhang NR, Siegmund DO (2007) A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data, Biometrics 63, 22.
    2. Chan HP, Zhang NR ǂ (2007) Scan statistics with weighted observations, Journal of the American Statistical Association, 102, 595.
    3. Zhang NR, Siegmund DO, Ji H, Li J (2010) Detecting simultaneous changepoints in multiple sequences, Biometrika 97, 631.
    4. Siegmund DO, Zhang NR, Yakir B (2011) False discovery rate for scanning statistics, Biometrika 98, 979.
    5. Chen H, Zhang NRǂ (2015) Graph-based change-point detection, The Annals of Statistics 43, 139.
    6. Zhang NR, Siegmund DO (2012) Model selection for high dimensional, multi-sequence change-point problems, Statistica Sinica 22, 1507.
  • General multiple testing control, high-dimensional inference
    1. Li F, Zhang NRǂ (2010) Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics, Journal of the American Statistical Association 105, 1202.
    2. Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR ǂ (2010) Subsampling methods for genomic inference, Annals of Applied Statistics 4, 1660.
    3. Sun Y, Zhang NR and Owen A* (2012) Multiple hypothesis testing, adjusted for latent variables, with an application to the agemap gene expression data, Annals of Applied Statistics 6, 1664.
  • DNA copy number estimation, variant detection and inference (see also the first bullet point which focuses more on the theory and methods aspect)
    1. Zhang NR, Senbabaoglu Y, Li J* (2010) Joint estimation of DNA copy number from multiple platforms, Bioinformatics 26, 153.
    2. Chen H, Xing H, Zhang NR* (2011) Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays, PLoS Computational Biology 7, e1001060.
    3. Siegmund DO, Yakir B, Zhang NR* (2011) Detecting simultaneous variant intervals in aligned sequences, Annals of Applied Statistics 5, 645.
    4. Shen J, Zhang NR* (2012) Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing, Annals of Applied Statistics 6, 476.
    5. Chen H, Bell JM, Zavala NA, Ji HP, Zhang NR* (2015) Allele-specific copy number profiling by next-generation DNA sequencing, Nucleic Acids Research 43, e23.
    6. Jiang Y, Oldridge DA, Diskin SJ, Zhang NR * (2015) CODEX: a normalization and copy number variation detection method for whole exome sequencing, Nucleic Acids Research 43, e39.
    7. Xia LC, Sakshuwong S, Hopmans ES, Bell JM, Grimes SM, Siegmund DO, Ji HP, Zhang NR* (2016) A genome-wide approach for detecting novel insertion-deletion variants of mid-range size, Nucleic Acids Research 44, e126.
  • Intra-tumor heterogeneity and cancer genomics (see also #2, 5, 7 under “DNA copy number estimation”)
    1. Jiang Y, Qiu Y, Minn AJ, Zhang NR* (2016) Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing, Proceedings of the National Academy of Sciences 113, E5528.
    2. Muralidharan O, Natsoulis G, Bell J, Ji H, Zhang NR* (2012) Detecting mutations in mixed sample sequencing data using empirical Bayes, Annals of Applied Statistics 6, 1047.
    3. Xia LC, Bell JM, Wood-Bouwens C, Chen JJ, Zhang NR*, Ji HP* (2017) Single molecule-based discovery of complex genomic rearrangements, Nucleic Acids Research 46, e19.
  • Single cell genomics
    1. Jiang Y, Zhang NR*, Li M* (2017) SCALE: modeling allele-specific gene expression by single-cell RNA-sequencing, Genome Biology 18, 74.
    2. Jia C, Hu Y, Kelly D, Kim J, Li M*, Zhang NR* (2017) Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data, Nucleic Acids Research, 45, 10978.
    3. Huang M, Wang J, Torre E, Dueck H, Shaffer S, Bonasio R, Murray J, Raj A, Li M, Zhang NR* (2018) SAVER: Gene expression recovery for single cell RNA sequencing, Nature Methods, 15, 539.
    4. Wang J, Huang M, Torre E, Dueck H, Shaffer S, Murray J, Raj A, Li M, Zhang NR* (2018) Gene expression distribution deconvolution in single cell RNA sequencing, accepted by Proceedings of the National Academy of Sciences.

For a complete overview of Dr. Zhang’s publications, funded grants, and teaching, mentoring, and service work, see her CV above.

For more on the recent single cell research done in Dr. Zhang’s group since 2016, see this website for Laboratory for Single Cell Data Science @ Penn, joint with Dr. Mingyao Li from School of Medicine:  http://singlecell.wharton.upenn.edu/

Dr. Zhang obtained her Ph.D. in Statistics in 2005 from Stanford University.  After one year of postdoctoral training at University of California, Berkeley, she returned to the Department of Statistics at Stanford University as Assistant Professor in 2006.  She received the Sloan Fellowship in 2011, before formally moving to University of Pennsylvania in 2012 as tenured Associate Professor.  She is the Principal Investigator in multiple independent research awards funded by the National institutes of Health and National Science Foundation. At Penn, she is a member of the Graduate Group in Genomics and Computational Biology and of the Penn Neurodegeneration Genomics Center.

 

Continue Reading

Research

Much of my recent efforts have been devoted to tackling the challenges in single cell genomic research.  For more on this aspect of my research, you are welcome to browse this website:

http://singlecell.wharton.upenn.edu/

which features the joint work of my lab and the lab of Dr. Mingyao Li from Penn School of Medicine during the last 3 years.

For a complete list of my publications and funded grants, the most trustworthy source is my CV (see link above). The searchable publication list below is only updated once per year.

Here’s a list of my currently funded projects:

  1. Statistical Methods for Single‐ Cell Transcriptomics (R01 from NIGMS, role: PI in MPI team)
  2. Genomic and Cellular Variation from Single Molecules to Single Cells (R01 from NHGRI, role: PI)
  3. Statistical Methods for High‐ Resolution Multiscale Analysis of 3D Genome (NSF-NIGMS Award, role: co-PI)
  4. Radiation and Checkpoint Blockade for Cancer Immune Therapy (P01 from NCI, role: Co-Investigator)
  5. The NIA Genetics of Alzheimer’s Disease Data Storage Site (U24 from NIA, role: Co-Investigator)
  6. Identifying Genes and Pathways that Impact Tau Toxicity in FTD (U54, role: Co-Investigator)
  7. Coordinating Center for Genetics and Genomics of Alzheimer’s Disease (U54 from NIA, role: Co-Investigator)

Teaching

Past Courses

  • STAT102 - INTRO BUSINESS STAT

    Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

  • STAT405 - STAT COMPUTING WITH R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT431 - STATISTICAL INFERENCE

    Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 101 and 102.

  • STAT471 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT701 - MODERN DATA MINING

    Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

  • STAT705 - STAT COMPUTING WITH R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT991 - SEM IN ADV APPL OF STAT

    This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.

Awards and Honors

  • Sloan Fellowship, 2011
  • New World Silver Medal for Best PhD Thesis in Mathematical Sciences, 2007

Activity

Awards and Honors

Sloan Fellowship 2011
All Awards