Validation in Statistics and Machine Learning - Abstract

Strobl, Carolin

What we can learn from trees and forests

The talk reviews two issues relevant in the interpretation of random forest variable importance measures: variable selection bias, that has led to an artificial preference of variables of certain types in early CART and random forest algorithms, and the pros and cons of conditional and marginal variable importance measures, that are also discussed in other areas of statistics. The talk closes with an outlook on how recursive partitioning algorithms can not only serve as a tool for analyzing primary data, but also on a meta-level for comparing the performance of different algorithms - including recursive partitioning algorithms themselves.