Validation in Statistics and Machine Learning - Abstract
Exact inference on probabilistic graphical models quickly becomes intractable when the dimension of the problem increases. A weighted average (or mixture) of different simple graphical models can be used instead of a more complicated model to learn a distribution, allowing probabilistic inference to be much more efficient. I hope to discuss issues related to the validation of algorithms for learning such mixtures of models and to high-dimensional learning of probabilistic graphical models in general, and to gather valuable feedback and comments on my approach. The main problems are the difficulties to assess the accuracy of the algorithms and to choose a representative set of target distributions. The accuracy of algorithms for learning probabilistic graphical models is often evaluated by comparing the structure of the resulting model to the target (e.g. Number of similar/dissimilar edges, score BDe etc). This approach however falls short when studying methods using a mixture of simple models : individually, these lack the representation power to model the true distribution, and only their combination allows them to compete with more sophisticated models. The Kullback-Leibler divergence is a measure of the difference between two probability densities, and can be used to compare any model learned from a dataset to the data generating distribution. For computational reasons, I however had to resort to a Monte Carlo estimation of this quantity for large problems (starting at around 200 variables).
Since probabilistic inference is the ultimate motivation for building these models, and not probability modelling, a more meaningful measure of accuracy could be obtained by comparing mixtures against a combination of state of the art model learning and approximate inference algorithms. However, the exact inference result cannot be easily assessed for interesting target distributions, since the use of mixtures is precisely considered because exact inference is not possible on said targets, and approximate inference would introduce a bias.
Selecting a target distribution used to generate the data sets on which the algorithms are evaluated also proved a challenge. The easiest solution was to generate them at random (although different approaches can be designed). These models are however likely to be rather different from real problems, and thus constitute a poor choice to assess the practical interest of mixture of models. Methods (e.g. linking multiple copies of a given network) have been developed to increase the size of models known by the community (e.g. the alarm network), and the obtained graphical models have been made available. These could however still be far from the kind of interactions present in a real setting. A better way to proceed could be to generate samples based on the equations describing a physical problem, to learn a probabilistic model as best as possible from this high-dimensional dataset, and to use it as target distribution.