Warning: This documentation is for scikits.learn version 0.8. — Latest stable version

This page

scikits.learn.cluster.SpectralClustering

class scikits.learn.cluster.SpectralClustering(k=8, mode=None, random_state=None)

Apply k-means to a projection to the normalized laplacian

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plan.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

Parameters :

k: integer, optional :

The dimension of the projection subspace.

mode: {None, ‘arpack’ or ‘amg’} :

The eigenvalue decomposition strategy to use. AMG (Algebraic MultiGrid) is much faster, but requires pyamg to be installed.

random_state: int seed, RandomState instance, or None (default) :

A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when mode == ‘amg’ and by the K-Means initialization.

References

Attributes

labels_: Labels of each point

Methods

fit(X): Compute spectral clustering
__init__(k=8, mode=None, random_state=None)
fit(X, **params)

Compute the spectral clustering from the affinity matrix

Parameters :

X: array-like or sparse matrix, shape: (n_samples, n_samples) :

An affinity matrix describing the pairwise similarity of the data. If can also be an adjacency matrix of the graph to embed. X must be symmetric and its entries must be positive or zero. Zero means that elements have nothing in common, whereas high values mean that elements are strongly similar.

Notes

If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the gaussian (heat) kernel:

np.exp(- X ** 2 / (2. * delta ** 2))

Another alternative is to take a symmetric version of the k nearest neighbors connectivity matrix of the points.

If the pyamg package is installed, it is used: this greatly speeds up computation.