awc - Adaptive Weights Clustering

Developed by:
Prof. Vladimir Spokoiny
Kirill Efimov
Larisa Adamyan

Overview

AWC is an open source package containing implementation of the novel non-parametric clustering algorithm called Adaptive Weights Clustering.
AWC offers an insight into the definition of a cluster as a homogeneous region without gaps.

The clustering structure of the data is described in terms of binary weights wij, where wij = 1 indicates being points Xi and Xj in the same cluster, whereas wij = 0 means that these points belong to different clusters. For each point Xi, the associated cluster Ci is given by the collection of positive weights wij over all j. The resulting symmetric matrix of weights W consists of blocks of ones, where each block of ones describes one cluster. The proposed procedure attempts iteratively recover the weights wij from the data. It starts with very local clustering structure Ci(0), that is, the starting positive weights Wij(0) are limited to the closest neighbors Xj of the point Xi in terms of the distance d(Xi, Xj). At each step k ≥ 1, the weights Wij(k) are recomputed by means of statistical tests of ''no gap'' between Ci(k-1) and Cj(k-1), the local clusters on step k - 1 for points Xi and Xj correspondingly. The number of neighbors Xj for each fixed point Xi grows in each step. The resulting matrix of weights W is used for the final clustering.

Each animation contains two plots. The left side is matrix of weights (zero weights are black and weights equal to one are white). On the right side one point for each cluster is picked. Animation shows how the connections of these points are spreading during the work of the algorithm. Also the matrix of weights is gradually filling. From left: 3 Normal clusters, [7], [3]


Key Features

  • The method is fully adaptive and does not require to specify the number of clusters or their structure.
  • The clustering results are not sensitive to noise and outliers.
  • Is able to detect non-convex clusters with different density.
  • Is able to recover different clusters with sharp edges or manifold structure.
  • Shows very nice performance on high, even infinity dimensional datasets.


AWC performance on artificial datasets. From Left: [1][2][3][4][5][6]

Implementation

  • AWC is written under python 2.7 and thus platform independent.
  • Requires the following basic packages to be installed: numpy, scipy, matplotlib.

Download and Usage

The package is freely available at GitHub under MIT license.

You can find all the instructions about Installation, Usage and also an example code for running AWC on the GitHub page.

You can ask us, if you need assistance: Mail to Kirill Efimov or Mail to Larisa Adamyan.

References

  1. Rodriguez, Alex, and Alessandro Laio. "Clustering by fast search and find of density peaks." Science 344.6191 (2014): 1492-1496.
  2. G. Karypis, E.H. Han, V. Kumar, CHAMELEON: A hierarchical 765 clustering algorithm using dynamic modeling, IEEE Trans. on Computers, 32 (8), 68-75, 1999.
  3. Chang and D.Y. Yeung, Robust path-based spectral clustering. Pattern Recognition, 2008. 41(1): p. 191-203.
  4. A. Gionis, H. Mannila, and P. Tsaparas, Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 2007. 1(1): p. 1-30.
  5. Ultsch, A.: Clustering with SOM: U*C, In Proc. Workshop on Self-Organizing Maps, Paris, France, (2005) , pp. 75-82
  6. Zelnik-Manor, Lihi, and Pietro Perona. "Self-tuning spectral clustering." NIPS. Vol. 17. No. 1601-1608. 2004.
  7. C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 1971. 100(1): p. 68-86.