Validation in Statistics and Machine Learning - Abstract

Gerds, Thomas A.

Confidence scores for prediction models

(joint work with Mark van de Wiel)

Machine learning provides many alternative strategies for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation, then rival strategies can still be compared based on repeated splits of the same data (see e.g. [1], [2], [3]). Often however the overall performance of rival strategies is similar and it is thus difficult to decide for one model. Here we investigate the variability of the prediction models that results when the same modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same splits of the data. Population average confidence scores can then be used to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated using examples from medical statistics, also with high-dimensional data.

AM Molinaro, R Simon, and RM Pfeiffer. Prediction error estimation: a comparison of resampling. Bioinformatics, 21:3301-3307, 2005.
TA Gerds, T Cai, and M Schumacher. The performance of risk prediction models. Biometrical Journal, 50(4):457-479, 2008.
M van de Wiel, J Berkhof, and NW van Wieringen. Testing the prediction error difference between 2 predictors. Biostatistics, Advance access, 2009. doi:10.1093/biostatistics/kxp011.