Validation in Statistics and Machine Learning - Abstract

Mucha, Hans-Joachim

Validation in cluster analysis

Clustering is a method of unsupervised learning where the learner is given only unlabeled observations. In the statistical software ClusCorr98, built-in validation techniques are in use. They can be applied to both hierarchical and partitional cluster analysis methods. The finding of the appropriate number of clusters, as the main task of model selection, is the ultimate aim here. Additionally, the stability of each single cluster is evaluated. There are several measures of similarity between two clusterings (Hubert and Arabie 1985) and between sets (Hennig 2007). Finally, the degree of membership of each observation to its cluster can be assessed. Applications to archaeometry and to dialectometry are presented.

References:

Hennig, C. (2007): Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis 52, 258-271.
Hubert, L. J. and Arabie, P. (1985): Comparing Partitions. Journal of Classification, 2, 193-218.