scikits.learn.feature_extraction.text.TermCountVectorizer¶
- class scikits.learn.feature_extraction.text.TermCountVectorizer(vocabulary={})¶
Convert a document collection to a document-term matrix.
Parameters : vocabulary: dict, optional :
A dictionary where keys are tokens and values are indices in the matrix. This is useful in order to fix the vocabulary in advance.
Methods
fit transform - __init__(vocabulary={})¶
- fit(tokenized_documents, y=None)¶
The learning is postponed to the first time transform() is called so this method doesn’t actually do anything.
- transform(tokenized_documents)¶
Learn the vocabulary dictionary if necessary and return the vectors.
Parameters : tokenized_documents: list :
a list of tokenized documents
Returns : vectors: array, [n_samples, n_features] :