305 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: Statistics and machine learning
Research interests:
The group is always looking to expand. We are recruiting PhD students at Penn to work on problems in statistics and machine learning. PhD applicants interested to work with me should mention this on their application. Please apply through the departments of Statistics & Data Science, Computer and Information Science, and the AMCS program, as it gives higher chances for admission.
Education (cv):
Recent news:
Miscellanea:
Talk slides: GitHub. Google Scholar.
Yonghoon Lee, Eric Tchetgen Tchetgen, Edgar Dobriban Batch Predictive Inference.
Yan Sun, Pratik Chaudhari, Ian J. Barnett, Edgar Dobriban A Confidence Interval for the ℓ2 Expected Calibration Error.
Behrad Moniri, Seyed Hamed Hassani, Edgar Dobriban, Evaluating the Performance of Large Language Models via Debates.
Patrick Chao, Edgar Dobriban, Seyed Hamed Hassani, Watermarking Language Models with Error Correcting Codes.
Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Seyed Hamed Hassani, Dongsheng Ding, One-Shot Safety Alignment for Large Language Models via Optimal Dualization.
Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Seyed Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban, Uncertainty in Language Models: Assessment through Rank-Calibration.
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Seyed Hamed Hassani, Eric Wong, JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models.
Leda Wang, Zhixiang Zhang, Edgar Dobriban, Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms.
Xianli Zeng, Guang Cheng, Edgar Dobriban Minimax Optimal Fair Classification with Bounded Demographic Disparity.
Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score ε-Discretization.
“For deep, fundamental, and wide-ranging contributions to mathematical statistics and statistical machine learning, including high-dimensional asymptotics (ridge regression, PCA), multiple testing, randomization tests, scalable statistical learning via random projections and distributed learning, uncertainty quantification for machine learning (calibration, prediction sets), robustness, fairness, and Covid-19 pooled testing via hypergraph factorization.”
This page has links to methods from my papers. Feel free to contact me if you are interested to use them.
The ePCA method for principal component analysis of exponential family data, e.g. Poisson-modeled count data. (with L.T. Liu);
Methods for working with large random data matrices, including
P-value weighting techniques for multiple hypothesis testing. These can improve power in multiple testing, if there is prior information about the individual effect sizes. Includes the iGWAS method for Genome-Wide Association Studies.