Validation in Statistics and Machine Learning - Abstract

Goeman, Jelle

Fast approximate leave-one-out cross-validation for large sample sizes

(joint work with Rosa Meijer)

Finding the optimal tuning parameter in lasso and ridge regression by leave-one-out cross-validation can be a time-consuming process, especially when the sample size is large. We present an approximation to the cross-validated regression coefficients based on the Sherman-Morrison-Woodbury theorem, which can be used in generalized linear models and the Cox proportional hazards model with ridge and lasso penalty. Our approximation is exact for linear ridge regression, and can be seen as a first order asymptotic asymptotic approximation in more complex models. The accuracy of the approximation improves when the sample size increases, which is especially the case in which leave-one-out cross-validation is most time-consuming. We illustrate our method using genomic data in the context of survival analysis.