Risk Estimation Under High-Dimensional Asymptotics
Arian Maleki – Columbia University
In this talk, we study the problem of parameter tuning or equivalently the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for |LO-ALO|. Our theoretical analysis illustrates that |LO -ALO| converges to zero with overwhelming probability, when both n and p tend to infinity, while the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high-dimensionality of the problem. Our extensive numerical experiments show that |LO – ALO| decreases as n and p increase, revealing the excellent finite sample performance of ALO.
The talk is based on a joint work with Wenda Zhou, Ji Xu, and Kamiar Rahnama-Rad.