public class LuceneIterator extends AbstractLuceneIterator
Modifier and Type | Field and Description |
---|---|
protected String |
idField |
protected Set<String> |
idFieldSelector |
bump, field, indexReader, maxErrorDocs, nextDocId, nextLogRecord, normPower, numErrorDocs, skippedErrorMessages, terminfo, weight
Constructor and Description |
---|
LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower)
Produce a LuceneIterable that can create the Vector plus normalize it.
|
LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
String idField,
String field,
TermInfo termInfo,
Weight weight,
double normPower,
double maxPercentErrorDocs) |
Modifier and Type | Method and Description |
---|---|
protected String |
getVectorName(int documentIndex)
Given the document name, derive a name for the vector.
|
computeNext
endOfData, hasNext, next, peek
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEachRemaining
protected final String idField
public LuceneIterator(org.apache.lucene.index.IndexReader indexReader, String idField, String field, TermInfo termInfo, Weight weight, double normPower)
indexReader
- IndexReader
to read the documents from.idField
- field containing the id. May be null.field
- field to use for the VectortermInfo
- termInfoweight
- weightnormPower
- the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING
public LuceneIterator(org.apache.lucene.index.IndexReader indexReader, String idField, String field, TermInfo termInfo, Weight weight, double normPower, double maxPercentErrorDocs)
indexReader
- IndexReader
to read the documents from.idField
- field containing the id. May be null.field
- field to use for the VectortermInfo
- termInfoweight
- weightnormPower
- the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING
maxPercentErrorDocs
- most documents that will be tolerated without a term freq vector. In [0,1].LuceneIterator(org.apache.lucene.index.IndexReader, String, String, org.apache.mahout.utils.vectors.TermInfo,
org.apache.mahout.vectorizer.Weight, double)
protected String getVectorName(int documentIndex) throws IOException
AbstractLuceneIterator
getVectorName
in class AbstractLuceneIterator
documentIndex
- the lucene document index.IOException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.