8.1.2.5. sklearn.cluster.dbscan

sklearn.cluster.dbscan(X, eps=0.5, min_samples=5, metric='euclidean', random_state=None)

Perform DBSCAN clustering from vector array or distance matrix.

Parameters:

X: array [n_samples, n_samples] or [n_samples, n_features] :

Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.

eps: float, optional :

The maximum distance between two samples for them to be considered as in the same neighborhood.

min_samples: int, optional :

The number of samples in a neighborhood for a point to be considered as a core point.

metric: string, or callable :

The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.calculate_distance for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square.

random_state: numpy.RandomState, optional :

The generator used to initialize the centers. Defaults to numpy.random.

Returns:

core_samples: array [n_core_samples] :

Indices of core samples.

labels : array [n_samples]

Cluster labels for each point. Noisy samples are given the label -1.

Notes

See examples/plot_dbscan.py for an example.

References

Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226–231. 1996

Previous
Next