Nonconvex Optimization for Statistical Estimation and Learning: Beyond Smoothness and Convexity



How do we make sense of simple iterative algorithms — e.g., gradient descent — for large-scale optimization problems in the absence of classical assumptions, like smoothness and convexity? Such problems routinely arise, for example, in fitting neural networks, solving inverse problems in computational imaging, and in estimating statistical models under “adversarial perturbations.” In the talk, I will describe how one can answer several questions about these methods, such as whether they converge at all, how quickly they converge, whether they tend to saddle points and local minima, and whether we can provably accelerate them. In the process, we will encounter several mathematical tools in nonsmooth analysis, semi-algebraic geometry, and high dimensional probability and statistics.

Related Papers: