Risk Estimation Under High-Dimensional Asymptotics

Arian Maleki – Columbia University

Abstract

In this talk, we study the problem of parameter tuning or equivalently the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for |LO-ALO|. Our theoretical analysis illustrates that |LO -ALO| converges to zero with overwhelming probability, when both n and p tend to infinity, while the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high-dimensionality of the problem. Our extensive numerical experiments show that |LO – ALO| decreases as n and p increase, revealing the excellent finite sample performance of ALO.

The talk is based on a joint work with Wenda Zhou, Ji Xu, and Kamiar Rahnama-Rad.

Related preprints:
1. http://proceedings.mlr.press/v108/rad20a.html
2. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12374?casa_token=T0X4krj5PuoAAAAA%3AFRuow2DuDOtULXn2iZ-iCp6a_S-dujbO7hIEaAqJbmrc1dCOsd6jq8xFnbZqk_ETUaOVaDeq8XoP1kzf
3. https://ieeexplore.ieee.org/abstract/document/9476004