Taming Nonconvexity in Statistical and Reinforcement Learning

Yuxin Chen – Princeton University


Recent years have witnessed a flurry of activity in solving statistical estimation and reinforcement learning problems via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple first-order optimization methods have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently.

This talk presents two recent stories about the (in)-effectiveness of nonconvex learning algorithms. The first story is concerned with noisy low-rank tensor completion. While randomly initialized gradient descent suffers from a high-volatility issue in the sample-starved regime, a two-stage nonconvex algorithm is guaranteed to succeed, enabling optimal sample complexity, linear convergence, minimax statistical accuracy, and entrywise uncertainty quantification all at once. The second story covers policy optimization in reinforcement learning. On the one hand, we demonstrate that the popular softmax policy gradient method can take exponential time to converge; on the other hand, employing natural policy gradients and enforcing entropy regularization allow for fast global convergence. All of these corroborate the unreasonable effectiveness of nonconvex optimization (when properly designed).


Yuxin Chen is currently an assistant professor in the Department of Electrical Engineering at Princeton University, and is affiliated with Applied and Computational Mathematics, Computer Science, and Center for Statistics and Machine Learning. Prior to joining Princeton, he was a postdoctoral scholar in the Department of Statistics at Stanford University, and he completed his Ph.D. in Electrical Engineering at Stanford University. His research interests include high-dimensional statistics, mathematical optimization, and reinforcement learning. He has received the Princeton graduate mentoring award, the AFOSR Young Investigator Award, the ARO Young Investigator Award, the ICCM best paper award (gold medal), and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization.