305 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: Statistics and machine learning
Research interests:
The group is always looking to expand. We are recruiting PhD students at Penn to work on problems in statistics and machine learning. PhD applicants interested to work with me should mention this on their application. Please apply through the departments of Statistics & Data Science, Computer and Information Science, and the AMCS program, as it gives higher chances for admission.
Education (cv):
Recent news:
Miscellanea:
Talk slides: GitHub. Google Scholar.
Yonghoon Lee, Eric Tchetgen Tchetgen, Edgar Dobriban Batch Predictive Inference.
Yan Sun, Pratik Chaudhari, Ian J. Barnett, Edgar Dobriban A Confidence Interval for the ℓ2 Expected Calibration Error.
Behrad Moniri, Seyed Hamed Hassani, Edgar Dobriban, Evaluating the Performance of Large Language Models via Debates.
Patrick Chao, Edgar Dobriban, Seyed Hamed Hassani, Watermarking Language Models with Error Correcting Codes.
Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Seyed Hamed Hassani, Dongsheng Ding, One-Shot Safety Alignment for Large Language Models via Optimal Dualization.
Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Seyed Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban, Uncertainty in Language Models: Assessment through Rank-Calibration.
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Seyed Hamed Hassani, Eric Wong, JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models.
Leda Wang, Zhixiang Zhang, Edgar Dobriban, Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms.
Xianli Zeng, Guang Cheng, Edgar Dobriban Minimax Optimal Fair Classification with Bounded Demographic Disparity.
Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score ε-Discretization.
Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 1010 and 1020. This course may be taken concurrently with the prerequisite with instructor permission.
STAT4310002 ( Syllabus )
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9911301 ( Syllabus )
“For deep, fundamental, and wide-ranging contributions to mathematical statistics and statistical machine learning, including high-dimensional asymptotics (ridge regression, PCA), multiple testing, randomization tests, scalable statistical learning via random projections and distributed learning, uncertainty quantification for machine learning (calibration, prediction sets), robustness, fairness, and Covid-19 pooled testing via hypergraph factorization.”
This page has links to methods from my papers. Feel free to contact me if you are interested to use them.
The ePCA method for principal component analysis of exponential family data, e.g. Poisson-modeled count data. (with L.T. Liu);
Methods for working with large random data matrices, including
P-value weighting techniques for multiple hypothesis testing. These can improve power in multiple testing, if there is prior information about the individual effect sizes. Includes the iGWAS method for Genome-Wide Association Studies.