6.5.1. scikits.learn.neighbors.Neighbors¶
- class scikits.learn.neighbors.Neighbors(n_neighbors=5, window_size=1)¶
Classifier implementing k-Nearest Neighbor Algorithm.
Parameters : data : array-like, shape (n, k)
The data points to be indexed. This array is not copied, and so modifying this data will result in bogus results.
labels : array
An array representing labels for the data (only arrays of integers are supported).
n_neighbors : int
default number of neighbors.
window_size : int
Window size passed to BallTree
Notes
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Examples
>>> samples = [[0.,0.,1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]] >>> labels = [0,0,1,1] >>> from scikits.learn.neighbors import Neighbors >>> neigh = Neighbors(n_neighbors=3) >>> neigh.fit(samples, labels) Neighbors(n_neighbors=3, window_size=1) >>> print neigh.predict([[0,0,0]]) [0]
Methods
fit(X[, Y]) kneighbors(data[, n_neighbors]) Finds the K-neighbors of a point. predict(T[, n_neighbors]) Predict the class labels for the provided data. score(X, y) Returns the mean error rate on the given test data and labels. - __init__(n_neighbors=5, window_size=1)¶
Internally uses the ball tree datastructure and algorithm for fast neighbors lookups on high dimensional datasets.
- kneighbors(data, n_neighbors=None)¶
Finds the K-neighbors of a point.
Parameters : point : array-like
The new point.
n_neighbors : int
Number of neighbors to get (default is the value passed to the constructor).
Returns : dist : array
Array representing the lengths to point.
ind : array
Array representing the indices of the nearest points in the population matrix.
Examples
In the following example, we construnct a Neighbors class from an array representing our data set and ask who’s the closest point to [1,1,1]
>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> labels = [0, 0, 1] >>> from scikits.learn.neighbors import Neighbors >>> neigh = Neighbors(n_neighbors=1) >>> neigh.fit(samples, labels) Neighbors(n_neighbors=1, window_size=1) >>> print neigh.kneighbors([1., 1., 1.]) (array(0.5), array(2))
As you can see, it returns [0.5], and [2], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:
>>> print neigh.kneighbors([[0., 1., 0.], [1., 0., 1.]]) (array([ 0.5 , 1.11803399]), array([1, 2]))
- predict(T, n_neighbors=None)¶
Predict the class labels for the provided data.
Parameters : test: array :
A 2-D array representing the test point.
n_neighbors : int
Number of neighbors to get (default is the value passed to the constructor).
Returns : labels: array :
List of class labels (one for each data sample).
Examples
>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> labels = [0, 0, 1] >>> from scikits.learn.neighbors import Neighbors >>> neigh = Neighbors(n_neighbors=1) >>> neigh.fit(samples, labels) Neighbors(n_neighbors=1, window_size=1) >>> print neigh.predict([.2, .1, .2]) [0] >>> print neigh.predict([[0., -1., 0.], [3., 2., 0.]]) [0 1]
- score(X, y)¶
Returns the mean error rate on the given test data and labels.
Parameters : X : array-like, shape = [n_samples, n_features]
Training set.
y : array-like, shape = [n_samples]
Labels for X.
Returns : z : float