SOME NEW IDEAS FOR UNBIASED GRADIENT ESTIMATION IN OPTIMIZATION

RYAN ADAMS – PRINCETON UNIVERSITY

ABSTRACT

Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss two alternative approaches to constructing
unbiased gradient estimates. The first approach uses randomized truncation of objective functions defined as loops or limits. Such objectives arise in settings ranging from hyperparameter selection, to fitting parameters of differential equations, to variational inference using lower bounds on the log-marginal likelihood. The second approach revisits the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in
a randomized way such that less memory is used but little performance is lost.

These projects are joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, Nick McGreivy, and collaborators at Toronto and Tsinghua.