Interface | Description |
---|---|
Vectorizer | |
Weight |
Class | Description |
---|---|
DictionaryVectorizer |
This class converts a set of input documents in the sequence file format to vectors.
|
DocumentProcessor |
This class converts a set of input documents in the sequence file format of
StringTuple s.The
SequenceFile input should have a Text key
containing the unique document identifier and a
Text value containing the whole document. |
EncodedVectorsFromSequenceFiles |
Converts a given set of sequence files into SparseVectors
|
EncodingMapper |
The Mapper that does the work of encoding text
|
HighDFWordsPruner | |
SimpleTextEncodingVectorizer |
Runs a Map/Reduce job that encodes
FeatureVectorEncoder the
input and writes it to the output as a sequence file. |
SparseVectorsFromSequenceFiles |
Converts a given set of sequence files into SparseVectors
|
TF |
Weight based on term frequency only |
TFIDF | |
VectorizerConfig |
The config for a Vectorizer.
|
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.