Designing Better Nonconvex Models for Modern Statistical Applications
Feng Ruan – University of California, Berkeley
Nonconvex optimization is ubiquitous in modern data analysis. In many cases there are multiple ways to formulate the nonconvex objective. However, little is known about how the choice of objective function affects statistical guarantees, ease of numerical optimization and robustness to modeling assumptions. In this talk, I present how a geometric understanding of the optimization landscape helps us design practically useful nonconvex objectives for two modern statistical inference problems: interaction detection and phase retrieval.
In the first part of the talk, I discuss the problem of discovering interactions between variables. Convex modeling of interactions suffers computational challenges arising from the enumeration of O(p^k) possible interactions of order k. I show how to identify the variables involved in interactions with only linear computation cost using a nonconvex kernel screening objective. I will also show that the design of the kernel in the non-convex objective is crucial if methods that find local minima are to succeed on the interaction detection task. In particular, the Laplace kernel can provably eliminate unfavorable stationary points that may trap the gradient descent algorithm; the Gaussian kernel has no such ability. Finally, I present a surprising property of the nonconvex kernel screening objective all stationary points of the objective are automatically sparse in finite samples without use of known regularization techniques, e.g., l_1 penalization and early stopping. This automatic sparsity shows that careful design of nonconvex models can potentially allow favorable statistical guarantees that traditional convex objectives are unable to achieve. In the second part of the talk, I will briefly discuss the problem of phase retrieval, showing how a careful design of the nonsmooth nonconvex objective provably solves the phase retrieval problem (with possible faulty measurements) with extremely high probability under appropriate random measurement models.
Three relevant papers:
A Self-Penalizing Objective Function for Scalable Interaction Detection
Taming Nonconvexity in Kernel Feature Selection—Favorable Properties of the Laplace Kernel