Generalizing Beyond the Training Data: New Theory and Algorithms for Optimal Transfer Learning

REESE PATHAK – UC BERKELEY

ABSTRACT

Traditional machine learning often assumes that training (source) data closely resembles the testing (target) data. However, in many contemporary applications this is unrealistic: in e-commerce, consumer behavior is time-varying; in medicine, patient populations can exhibit more or less heterogeneity; in autonomous driving, models are rolled out to new environments. Ignoring these “distribution shifts” can lead to costly, harmful, and even dangerous outcomes. My research tackles these challenges by developing an algorithmic and statistical toolkit for addressing distribution shifts.

This talk focuses on covariate shift, a form of distribution shift where the source and target distributions have different covariate laws. In the first part of the talk, I demonstrate that for a large class of problems, transfer learning is possible, even when the source and target data have non-overlapping support. We introduce the “defect” of a covariate shift, which quantifies the severity of a distribution shift. We demonstrate how the defect can be leveraged algorithmically, leading to methods with optimal learning guarantees.

In the second part of the talk, we refine the notion of defect to provide even stronger learning guarantees. We introduce a new method: penalized risk minimization with a non-traditional choice of regularization which is chosen via semidefinite programming. We show that our method has performance which is optimal with respect to the particular covariate shift instance. To our knowledge, these are the first instance-optimal guarantees for transfer learning. Moreover, our results are assumption-light: we impose essentially no restrictions on the underlying covariate laws, thereby broadening the applicability of our theory.

Related Papers