PREDICTION, LEARNING, AND MEMORY
SHAM KAKADE – UNIVERSITY OF WASHINGTON
Building accurate language models that capture meaningful long-term dependencies is a core challenge in language processing. We consider the problem of predicting the next observation given a sequence of past observations, specifically focusing on the question of how to make accurate predictions that explicitly leverage long-range dependencies. Empirically, and perhaps surprisingly, we show that state-of-the-art language models, including LSTMs and Transformers, do not capture even basic properties of natural language: the entropy rates of their generations drift dramatically upward over time. We also provide provable methods to mitigate this phenomenon: specifically, we provide a calibration-based approach to improve an estimated model based on any measurable long-term mismatch between the estimated model and the true underlying generative distribution.
More generally, we will also present fundamental information theoretic and computational limits of sequential prediction with a memory.