Model-based cluster analysis applied to flow cytometry data of phytoplankton
- Mucha, Hans-Joachim
- Simon, Ute
- Brüggemann, Rainer
2010 Mathematics Subject Classification
- 62-07 62H30 62H25
- cluster analysis, K-means, data mining, principal components analysis, freshwater ecology, phytoplankton, flow cytometry
Starting from well-known model-based clustering models equivalent formulations for some special models based on pairwise distances are presented. Moreover, these models can be generalized in order to taking into account both weights of observations and weights of variables. Well-known cluster analysis techniques like the iterative partitional K-means method or the agglomerative hierarchical Ward method are useful for discovering partitions or hierarchies in the underlying data. Here these methods are generalised in two ways, firstly by using weighted observations and secondly by allowing different volumes of clusters. Then a more general K-means approach based on pair-wise distances is recommended. Simulation studies are carried out in order to compare the new clustering techniques with the well-known ones. Afterwards a successful application in the field of freshwater ecology is presented. As an example, the cluster analysis of a snapshot from monitoring of phytoplankton (algae) is considered in more detail. Indeed, monitoring by microscope is very time- and work-consuming. Flow cytometry provides the opportunity to investigate algae communities in a semiautomatic way. Statistical data analysis and cluster analysis can at least support the investigations. Here a combination of agglomerative hierarchical clustering and iterative clustering is recommended. In order to give some insight into the data under investigation several univariate, bivariate and multivariate visualizations are proposed.