Theoretical Insights into Wide Neural Networks: Optimization, Generalization and Robustness

ADEL JAVANMARD – USC MARSHALL
ABSTRACT
The success of neural networks is often reliant on training highly complex architectures. In this talk, I will discuss some intriguing insights on the role of network width from optimization, generalization and robustness perspectives. In the first part, we consider a function regression model where the goal is to learn a concave function using a linear combination of `bump-like’ components (neurons). The parameters to be fitted are the centers of the bumps and the resulting empirical risk minimization problem is highly non-convex. Formulating it as a two-layer neural network, we show that in the limit that the network width diverges, the evolution of gradient descent converges to a Wasserstein gradient flow. Remarkably, the cost function optimized by the gradient flow exhibits a special property known as `displacement convexity’ which implies global convergence at an exponential rate. In the second part, we consider adversarial robustness of neural networks to small perturbation to input features during test-time. For random features models (two-layer networks with the first layer weights fixed to random weights), we provide a precise characterization of the role of network width on its robustness. Our theory reveals several intriguing phenomena and indicates that larger width can hurst robust generalization!
Related Papers:
- https://projecteuclid.org/journals/annals-of-statistics/volume-48/issue-6/Analysis-of-a-two-layer-neural-network-via-displacement-convexity/10.1214/20-AOS1945.full
- https://arxiv.org/abs/1707.04926
- https://arxiv.org/abs/2201.05149
