Validation in Statistics and Machine Learning - Abstract

Zuber, Verena

High-dimensional feature selection by decorrelation

We present a novel approach to feature selection and variable importance introducing "CAR" scores, which are defined as the marginal correlation between response and predictors adjusted for correlation among all predictors. Decorrelation leads to an elegant reformulation of the linear model and the decomposition of variance. In particular the squared CAR scores sum up to the proportion of variance explained. The CAR score is a population quantity and thus it is independent from any inference framework. Finally, we compare the performance of the CAR score with competing procedures such as lasso and elastic net using simulations and high-dimensional gene-expression data.