305 Wharton Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: Statistics and machine learning
The main interests in my group are:
The group is always looking to expand. We are recruiting PhD students at Penn to work on problems in statistics and machine learning. PhD applicants interested to work with me should mention this on their application. Please apply through both the Statistics department and the AMCS program, as it gives higher chances for admission.
Seminar class in Fall 2019: Topics in Deep Learning (STAT-991), surveying advanced topics in deep learning research based on student presentations. See the Github page for the class materials.
Education (cv):
Recent news:
Miscellanea:
Talk slides: GitHub. Google Scholar.
David Hong, Yue Sheng, Edgar Dobriban, Selecting the number of components in PCA via random signflips.
Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu (2020), Implicit regularization of normalization methods, Neural Information Processing Systems (NeurIPS) 2020.
Jonathan Lacotte, Sifan Liu, Edgar Dobriban, Mert Pilanci (2020), Limiting spectrum of randomized hadamard transform and optimal iterative sketching methods, Neural Information Processing Systems (NeurIPS) 2020.
Shuxiao Chen, Edgar Dobriban, Jane H Lee (2020), A group-theoretic framework for data augmentation, JMLR, Neural Information Processing Systems (NeurIPS) 2020.
Michal Derezinski, Zhenyu Liao, Edgar Dobriban, Michael W. Mahoney, Sparse sketches with small inversion bias.
Licong Lin and Edgar Dobriban, What causes the test error? Going beyond bias-variance via ANOVA.
Yinjun Wu, Edgar Dobriban, Susan Davidson (2020), DeltaGrad: Rapid retraining of machine learning models, International Conference on Machine Learning (ICML) 2020.
Fan Yang, Sifan Liu, Edgar Dobriban, David P. Woodruff, How to reduce dimension with PCA and random projections?.
Sifan Liu and Edgar Dobriban (2020), Ridge regression: Structure, cross-validation, and sketching, International Conference on Learning Representations (ICLR).
Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani (2020), The implicit regularization of stochastic gradient flow for least squares, International Conference on Machine Learning (ICML) 2020.
This page has links to methods from my papers. Feel free to contact me if you are interested to use them.
The ePCA method for principal component analysis of exponential family data, e.g. Poisson-modeled count data. (with L.T. Liu);
Methods for working with large random data matrices, including
P-value weighting techniques for multiple hypothesis testing. These can improve power in multiple testing, if there is prior information about the individual effect sizes. Includes the iGWAS method for Genome-Wide Association Studies.