Algorithmic Questions in High-Dimensional Robust Statistics




Fitting a model to a collection of observations is one of the quintessential questions in statistics. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This simplifying assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics (e.g., RANSAC) degrades badly as the dimension increases.

In this talk, we will survey the recent progress in algorithmic high-dimensional robust statistics. We will describe the first computationally efficient algorithms for robust mean and covariance estimation and the main insights behind them. We will also present practical applications of these estimators to exploratory data analysis and adversarial machine learning. Finally, we will discuss new directions and opportunities for future work.

The talk will be based on a number of joint works with (various subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.