305 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: Statistics and machine learning
A postdoctoral research fellow position is available to work on adaptive and adversarial machine learning, with connections to developmental psychology. Please see ad and send e-mail to Edgar.
New class in Spring 2022: Topics in Modern Statistical Learning (STAT-991), surveying uncertainty quantification for machine learning. See the Github page.
The group is always looking to expand. We are recruiting PhD students at Penn to work on problems in statistics and machine learning. PhD applicants interested to work with me should mention this on their application. Please apply through the departments of Statistics & Data Science, Computer and Information Science, and the AMCS program, as it gives higher chances for admission.
Xianli Zeng, Edgar Dobriban, Guang Cheng Fair Bayes-Optimal Classifiers Under Predictive Parity.
Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Edgar Dobriban, Kostas Daniilidis, SE(3)-Equivariant Attention Networks for Shape Reconstruction in Function Space.
Abstract: Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious challenge and has yet to be solved. In the framework of semiparametric statistics, we can view the covariate shift as a nuisance parameter. In this paper, we propose a novel flexible distribution-free method, PredSet-1Step, to construct prediction sets that can efficiently adapt to unknown covariate shift. PredSet-1Step relies on a one-step correction of the plug-in estimator of coverage error. We theoretically show that our methods are asymptotically probably approximately correct (PAC), having low coverage error with high confidence for large samples. PredSet-1Step may also be used to construct asymptotically risk-controlling prediction sets. We illustrate that our method has good coverage in a number of experiments and by analyzing a data set concerning HIV risk prediction in a South African cohort study. In experiments without covariate shift, PredSet-1Step performs similarly to inductive conformal prediction, which has finite-sample PAC properties. Thus, PredSet-1Step may be used in the common scenario if the user suspects -- but may not be certain -- that covariate shift is present, and does not know the form of the shift. Our theory hinges on a new bound for the convergence rate of Wald confidence interval coverage for general asymptotically linear estimators. This is a technical tool of independent interest.
Donghwan Lee, Xinmeng Huang, Seyed Hamed Hassani, Edgar Dobriban, T-Cal: An optimal test for the calibration of predictive models.
Souradeep Dutta, Kaustubh Sridhar, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris, Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates.
Ramneet Kaur, Susmit Jha, Anirban Roy, Sangdon Park, Edgar Dobriban, Oleg Sokolsky, Insup Lee (2022), iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection,.
Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Kostas Daniilidis, Edgar Dobriban Learning Augmentation Distributions using Transformed Risk Minimization.
Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients.
Dominic Richards, Edgar Dobriban, Patrick Rebeschini, Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?.
Description: Authors: Dominic Richards, Edgar Dobriban, Patrick Rebeschini
This page has links to methods from my papers. Feel free to contact me if you are interested to use them.
The ePCA method for principal component analysis of exponential family data, e.g. Poisson-modeled count data. (with L.T. Liu);
Methods for working with large random data matrices, including
P-value weighting techniques for multiple hypothesis testing. These can improve power in multiple testing, if there is prior information about the individual effect sizes. Includes the iGWAS method for Genome-Wide Association Studies.