Class | Description |
---|---|
MailArchivesClusteringAnalyzer |
Custom Lucene Analyzer designed for aggressive feature reduction
for clustering the ASF Mail Archives using an extended set of
stop words, excluding non-alpha-numeric tokens, and porter stemming.
|
MultipleTextFileInputFormat |
Used in combining a large number of text files into one text input reader
along with the WholeFileRecordReader class.
|
PrefixAdditionFilter |
Default parser for parsing text into sequence files.
|
SequenceFilesFromDirectory |
Converts a directory of text documents into SequenceFiles of Specified chunkSize.
|
SequenceFilesFromDirectoryFilter |
Implement this interface if you wish to extend SequenceFilesFromDirectory with your own parsing logic.
|
SequenceFilesFromDirectoryMapper |
Map class for SequenceFilesFromDirectory MR job
|
SequenceFilesFromMailArchives |
Converts a directory of gzipped mail archives into SequenceFiles of specified
chunkSize.
|
SequenceFilesFromMailArchivesMapper |
Map Class for the SequenceFilesFromMailArchives job
|
TextParagraphSplittingJob | |
TextParagraphSplittingJob.SplitMap | |
WholeFileRecordReader |
RecordReader used with the MultipleTextFileInputFormat class to read full files as
k/v pairs and groups of files as single input splits.
|
WikipediaToSequenceFile |
Create and run the Wikipedia Dataset Creator.
|
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.