public class InMemoryCollapsedVariationalBayes0 extends AbstractJob
CVB0Driver
, but sequentially, in memory. Memory requirements
are currently: the entire corpus is read into RAM, two copies of the model (each of size
numTerms * numTopics), and another matrix of size numDocs * numTopics is held in memory
(to store p(topic|doc) for all docs).
But if all this fits in memory, this can be significantly faster than an iterative MR job.argMap, inputFile, inputPath, outputFile, outputPath, tempPath
Constructor and Description |
---|
InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta,
int numTrainingThreads,
int numUpdatingThreads,
double modelCorpusFraction) |
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.conf.Configuration |
getConf() |
double |
iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter) |
double |
iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter,
double testFraction) |
static void |
main(String[] args) |
static int |
main2(String[] args,
org.apache.hadoop.conf.Configuration conf) |
int |
run(String[] strings) |
void |
setVerbose(boolean verbose) |
void |
trainDocuments() |
void |
trainDocuments(double testFraction) |
void |
writeModel(org.apache.hadoop.fs.Path outputPath) |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
public void setVerbose(boolean verbose)
public void trainDocuments()
public void trainDocuments(double testFraction)
public double iterateUntilConvergence(double minFractionalErrorChange, int maxIterations, int minIter)
public double iterateUntilConvergence(double minFractionalErrorChange, int maxIterations, int minIter, double testFraction)
public void writeModel(org.apache.hadoop.fs.Path outputPath) throws IOException
IOException
public static int main2(String[] args, org.apache.hadoop.conf.Configuration conf) throws Exception
Exception
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
getConf
in class AbstractJob
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.