Model Stealing for Low-Rank Language Models

ANKUR MOITRA – MASSACHUSETTS INSITUTE OF TECHNOLOGY

ABSTRACT

The problem of learning a sequence model has a rich history spanning many decades. While we of course want to understand how learning works for large language models, it turns out that even in highly simplified settings like hidden markov models the story is already fraught with complications. In particular, there are algorithms that work under various structural assumptions, and lower bounds that show that absent these assumptions the learning problem can indeed be computationally hard.

But why are we trying to learn from just random samples? We often have oracle access to already learned models wherein we can specify a prefix and get a sample from the conditional distribution on the rest of the sequence. Does this sort of oracle access provably make learning easier, in the sense that we can get algorithms that work in greater generality? I will show that the answer is yes, even for a more expressive class called low-rank language models.

Based on joint work with Allen Liu.