Mathematics of Deep Learning: Global Optimality, Implicit Bias, and Learning Dynamics

Rene Vidal – Johns Hopkins University

Abstract

In the past decade, deep networks have led to significant improvements in the performance of AI systems. However, the mathematical reasons for this success remain elusive. For example, deep networks enjoy good generalization performance despite the fact that they are highly overparametrized. Recent work suggests that overparameterization may bias the optimization algorithm towards solutions that generalize well, a phenomenon known as implicit bias or implicit regularization. However, the regularization properties of existing algorithms such as dropout remain poorly understood. Moreover, since the optimization problem is non-convex, optimization algorithms may not return a global minima. The first part of this talk will present sufficient conditions to guarantee that local minima of positively homogeneous networks are globally optimal. The second part of this talk will present an analysis of the optimization and regularization properties of dropout for both shallow and deep networks. The third part of this talk will present an analysis of the dynamics of gradient flow in overparameterized two-layer linear models showing that convergence to equilibrium depends on the imbalance between input and output weights (which is fixed at initialization) and the margin of the initial solution.