411 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104
Research Interests: statistical machine learning, high-dimensional inference, large-scale multiple testing, optimization, and privacy-preserving data analysis.
Links: Personal Website
Richard A. Berk, Andreas Buja, Lawrence D. Brown, Edward I. George, Arun Kumar Kuchibhotla, Weijie Su, Linda Zhao (2020), Assumption Lean Regression, American Statistician, (in press) ().
Matteo Sordello, Hangfeng He, Weijie Su (Working), Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic.
Abstract: This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimiza- tion. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around a vicinity of a local minimum. The detection is performed by splitting the single thread into two and using the inner product of the gradients from the two threads as a measure of stationarity. Owing to this simple yet provably valid stationarity detection, SplitSGD is easy-to-implement and essentially does not incur additional computational cost than standard SGD. Through a series of extensive experiments, we show that this method is appropriate for both convex problems and training (non-convex) neural networks, with performance compared favorably to other stochastic optimization methods. Importantly, this method is observed to be very robust with a set of default parameters for a wide range of problems and, moreover, yields better generalization performance than other adaptive gradient methods such as Adam.
Hangfeng He and Weijie Su (2020), The Local Elasticity of Neural Networks, International Conference on Learning Representations (ICLR), (to appear) ().
Zhiqi Bu, Jinshuo Dong, Qi Long, Weijie Su (Working), Deep Learning with Gaussian Differential Privacy.
Bin Shi, Simon S. Du, Weijie Su, Michael I. Jordan (2019), Acceleration via Symplectic Discretization of High-Resolution Differential Equations, Advances in Neural Information Processing Systems 32.
Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su (2019), Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing, Advances in Neural Information Processing Systems 32.
Jinshuo Dong, Aaron Roth, Weijie Su (Working), Gaussian Differential Privacy.
Qingyuan Zhao, Dylan Small, Weijie Su (2019), Multiple Testing When Many p-Values are Uniformly Conservative, with Application to Testing Qualitative Interaction in Educational Interventions, Journal of the American Statistical Association, 114 (527), pp. 1291-1304.
Damian Brzyski, Alexej Gossmann, Weijie Su, Malgorzata Bogdan (2019), Group SLOPE – Adaptive Selection of Groups of Predictors, Journal of the American Statistical Association, 114 (525), pp. 419-433.
Tengyuan Liang and Weijie Su (2019), Statistical Inference for the Population Landscape via Moment‐adjusted Stochastic Gradients, Journal of the Royal Statistical Society, Series B, 81 (2), pp. 431-456.
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
STAT9917301 ( Syllabus )
Independent Study allows students to pursue academic interests not available in regularly offered courses. Students must consult with their academic advisor to formulate a project directly related to the student’s research interests. All independent study courses are subject to the approval of the AMCS Graduate Group Chair.
Study under the direction of a faculty member.
The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.
Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 1010 and 1020. This course may be taken concurrently with the prerequisite with instructor permission.
Graphical displays; one- and two-sample confidence intervals; one- and two-sample hypothesis tests; one- and two-way ANOVA; simple and multiple linear least-squares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodness-of-fit tests. A methodology course.
The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.
Dissertation
For fundamental contributions to the development of privacy-preserving data analysis methodologies; for groundbreaking theoretical advancements in understanding gradient-based optimization methods; for outstanding contributions to high-dimensional statistics, including false discovery rate control and limits in sparsity estimation; for wide-ranging contributions to the theoretical foundation of deep learning.