scikits.learn.feature_extraction.text.SparseHashingVectorizer¶
- class scikits.learn.feature_extraction.text.SparseHashingVectorizer(dim=100000, probes=1, use_idf=True, analyzer=<scikits.learn.feature_extraction.text.WordNGramAnalyzer object at 0x38a8430>)¶
Compute term freq vectors using hashed term space in a sparse matrix
The logic is the same as HashingVectorizer but it is possible to use much larger dimension vectors without memory issues thanks to the usage of scipy.sparse datastructure to store the tf vectors.
This function requires scipy 0.7 or higher.
Methods
get_idf get_tfidf get_vectors hash_sign vectorize vectorize_files - __init__(dim=100000, probes=1, use_idf=True, analyzer=<scikits.learn.feature_extraction.text.WordNGramAnalyzer object at 0x38a8430>)¶
- get_tfidf()¶
Compute the TF-log(IDF) vectors of the sampled documents
- vectorize(text_documents)¶
Vectorize a batch of documents in python utf-8 strings or unicode
- vectorize_files(document_filepaths)¶
Vectorize a batch of utf-8 text files