Edgar Dobriban

Edgar Dobriban
  • Associate Professor of Statistics and Data Science, with secondary appointment in Computer and Information Science

Contact Information

  • office Address:

    305 Academic Research Building
    265 South 37th Street
    Philadelphia, PA 19104

Research Interests: Statistics and machine learning

Overview

Research interests:

  • statistical problems in AI safety
    • uncertainty quantification for machine learning: predictive inference, calibration
    • robustness in theory and in practice: distribution shift, jailbreaking LLMs
  • scalable data analysis
    • randomized algorithms: sketching and random projections
    • distributed learning
  • high-dimensional asymptotics
    • simple models of neural nets, random feature models
    • high-dimensional regression
    • dimension reduction, PCA
  • statistics in algorithmic fairness
  • data augmentation, invariance, and symmetry

The group is always looking to expand. We are recruiting PhD students at Penn to work on problems in statistics and machine learning. PhD applicants interested to work with me should mention this on their application. Please apply through the departments of Statistics & Data Science, Computer and Information Science, and the AMCS program, as it gives higher chances for admission.

Education (cv):

  • PhD in Statistics, Stanford University, 2017.  Advisor: David Donoho
  • BA in Mathematics (with highest honors/summa cum laude), Princeton University, 2012.

Recent news:

Miscellanea:

  • I use Twitter to keep up with new research.
  • I grew up in Romania, and speak Hungarian as a first language (the real spelling of my name is Dobribán Edgár). These two countries are and were the origin of many great mathematicians and statisticians, including John von Neumann, Abraham Wald, Paul Erdos, Dan-Virgil Voiculescu, etc…

 

Continue Reading

Research

Talk slides: GitHubGoogle Scholar.

Teaching

Awards and Honors

Miscellaneous

This page has links to methods from my papers.  Feel free to contact me if you are interested to use them.

ePCA: github

The ePCA method for principal component analysis of exponential family data, e.g. Poisson-modeled count data. (with L.T. Liu);

EigenEdge: github

Methods for working with large random data matrices, including

  • Computing eigenvalue distributions of covariance matrices (general Marchenko-Pastur distributions).
  • Optimal statistics for testing in principal component analysis.
  • Tools for spiked covariance models: spike and cosine descriptors, optimal shrinkers.

pweight : github R. github Matlab

P-value weighting techniques for multiple hypothesis testing. These can improve power in multiple testing, if there is prior information about the individual effect sizes. Includes the iGWAS method for Genome-Wide Association Studies.