ModelTrainer (Mahout Map-Reduce 0.13.0 API)

java.lang.Object
- org.apache.mahout.clustering.lda.cvb.ModelTrainer

```
public class ModelTrainer
extends Object
```
Multithreaded LDA model trainer class, which primarily operates by running a "map/reduce" operation, all in memory locally (ie not a hadoop job!) : the "map" operation is to take the "read-only" TopicModel and use it to iteratively learn the p(topic|term, doc) distribution for documents (this can be done in parallel across many documents, as the "read-only" model is, well, read-only. Then the outputs of this are "reduced" onto the "write" model, and these updates are not parallelizable in the same way: individual documents can't be added to the same entries in different threads at the same time, but updates across many topics to the same term from the same document can be done in parallel, so they are. Because computation is done asynchronously, when iteration is done, it's important to call the stop() method, which blocks until work is complete. Setting the read model and the write model to be the same object may not quite work yet, on account of parallelism badness.

Constructor Summary

Constructors
Constructor and Description
`ModelTrainer(TopicModel model, int numTrainThreads, int numTopics, int numTerms)` WARNING: this constructor may not lead to good behavior.
`ModelTrainer(TopicModel initialReadModel, TopicModel initialWriteModel, int numTrainThreads, int numTopics, int numTerms)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`batchTrain(Map<Vector,Vector> batch, boolean update, int numDocTopicsIters)`
`double`	`calculatePerplexity(VectorIterable matrix, VectorIterable docTopicCounts)`
`double`	`calculatePerplexity(VectorIterable matrix, VectorIterable docTopicCounts, double testFraction)`
`double`	`calculatePerplexity(Vector document, Vector docTopicCounts, int numDocTopicIters)`
`TopicModel`	`getReadModel()`
`void`	`persist(org.apache.hadoop.fs.Path outputPath)`
`void`	`start()`
`void`	`stop()`
`void`	`train(VectorIterable matrix, VectorIterable docTopicCounts)`
`void`	`train(VectorIterable matrix, VectorIterable docTopicCounts, int numDocTopicIters)`
`void`	`train(Vector document, Vector docTopicCounts, boolean update, int numDocTopicIters)`
`void`	`trainSync(Vector document, Vector docTopicCounts, boolean update, int numDocTopicIters)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

ModelTrainer

public ModelTrainer(TopicModel initialReadModel,
                    TopicModel initialWriteModel,
                    int numTrainThreads,
                    int numTopics,
                    int numTerms)

ModelTrainer
```
public ModelTrainer(TopicModel model,
                    int numTrainThreads,
                    int numTopics,
                    int numTerms)
```
WARNING: this constructor may not lead to good behavior. What should be verified is that the model updating process does not conflict with model reading. It might work, but then again, it might not!

Parameters:

model - to be used for both reading (inference) and accumulating (learning)

numTrainThreads -

numTopics -

numTerms -

Method Detail

getReadModel
```
public TopicModel getReadModel()
```

start
```
public void start()
```

train

public void train(VectorIterable matrix,
                  VectorIterable docTopicCounts)

calculatePerplexity

public double calculatePerplexity(VectorIterable matrix,
                                  VectorIterable docTopicCounts)

calculatePerplexity

public double calculatePerplexity(VectorIterable matrix,
                                  VectorIterable docTopicCounts,
                                  double testFraction)

train

public void train(VectorIterable matrix,
                  VectorIterable docTopicCounts,
                  int numDocTopicIters)

batchTrain

public void batchTrain(Map<Vector,Vector> batch,
                       boolean update,
                       int numDocTopicsIters)

train

public void train(Vector document,
                  Vector docTopicCounts,
                  boolean update,
                  int numDocTopicIters)

trainSync

public void trainSync(Vector document,
                      Vector docTopicCounts,
                      boolean update,
                      int numDocTopicIters)

calculatePerplexity

public double calculatePerplexity(Vector document,
                                  Vector docTopicCounts,
                                  int numDocTopicIters)

stop
```
public void stop()
```

persist

public void persist(org.apache.hadoop.fs.Path outputPath)
             throws IOException

Throws:: IOException

Class ModelTrainer

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ModelTrainer

ModelTrainer

Method Detail

getReadModel

start

train

calculatePerplexity

calculatePerplexity

train

batchTrain

train

trainSync

calculatePerplexity

stop

persist