## awc - Adaptive Weights Clustering

*Developed by:*- Prof. Vladimir Spokoiny
- Kirill Efimov
- Larisa Adamyan

## Overview

**AWC** is an open source package containing implementation of the novel non-parametric
clustering algorithm called Adaptive Weights Clustering.

AWC offers an insight into the definition of a cluster as a homogeneous region without gaps.

The clustering structure of the data is described in terms of binary weights *w _{ij}*,
where

*w*indicates being points

_{ij}= 1*X*and

_{i}*X*in the same cluster, whereas

_{j}*w*means that these points belong to different clusters. For each point

_{ij}= 0*X*, the associated cluster

_{i}*C*is given by the collection of positive weights

_{i}*w*over all

_{ij}*j*. The resulting symmetric matrix of weights

*W*consists of blocks of ones, where each block of ones describes one cluster. The proposed procedure attempts iteratively recover the weights

*w*from the data. It starts with very local clustering structure

_{ij}*C*, that is, the starting positive weights

_{i}^{(0)}*W*are limited to the closest neighbors

_{ij}^{(0)}*X*of the point

_{j}*X*in terms of the distance

_{i}*d(X*. At each step

_{i}, X_{j})*k ≥ 1*, the weights

*W*are recomputed by means of statistical tests of

_{ij}^{(k)}*''no gap'*' between

*C*and

_{i}^{(k-1)}*C*, the local clusters on step

_{j}^{(k-1)}*k - 1*for points

*X*and

_{i}*X*correspondingly. The number of neighbors

_{j}*X*for each fixed point

_{j}*X*grows in each step. The resulting matrix of weights

_{i}*W*is used for the final clustering.

* Each animation contains two plots. The left side is matrix of weights (zero weights are black and weights equal to one are white).
On the right side one point for each cluster is picked. Animation shows how the connections of these points are spreading during the work of the algorithm. Also the matrix of weights is gradually filling. From left: 3 Normal clusters, [7], [3]*

Key Features

- The method is fully adaptive and does not require to specify the number of clusters or their structure.
- The clustering results are not sensitive to noise and outliers.
- Is able to detect non-convex clusters with different density.
- Is able to recover different clusters with sharp edges or manifold structure.
- Shows very nice performance on high, even infinity dimensional datasets.

*AWC performance on artificial datasets. From Left: [1][2][3][4][5][6]*

## Implementation

- AWC is written under python 2.7 and thus platform independent.
- Requires the following basic packages to be installed:
*numpy, scipy, matplotlib*.

## Download and Usage

The package is freely available at GitHub under MIT license.

You can find all the instructions about Installation, Usage and also an example code for running AWC on the GitHub page.

You can ask us, if you need assistance: Mail to Kirill Efimov or Mail to Larisa Adamyan.

## References

- Rodriguez, Alex, and Alessandro Laio. "Clustering by fast search and find of density peaks." Science 344.6191 (2014): 1492-1496.
- G. Karypis, E.H. Han, V. Kumar, CHAMELEON: A hierarchical 765 clustering algorithm using dynamic modeling, IEEE Trans. on Computers, 32 (8), 68-75, 1999.
- Chang and D.Y. Yeung, Robust path-based spectral clustering. Pattern Recognition, 2008. 41(1): p. 191-203.
- A. Gionis, H. Mannila, and P. Tsaparas, Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 2007. 1(1): p. 1-30.
- Ultsch, A.: Clustering with SOM: U*C, In Proc. Workshop on Self-Organizing Maps, Paris, France, (2005) , pp. 75-82
- Zelnik-Manor, Lihi, and Pietro Perona. "Self-tuning spectral clustering." NIPS. Vol. 17. No. 1601-1608. 2004.
- C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 1971. 100(1): p. 68-86.