Optimal Convex Loss Selection Via Score Matching



In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove guarantees on its asymptotic relative efficiency compared with an oracle procedure that has knowledge of the error distribution. As an example of a highly non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and yields an asymptotic relative efficiency greater than 0.87; in this sense, we obtain robustness without sacrificing (much) efficiency.