org.apache.mahout.classifier.naivebayes
Math-Scala Naive Bayes optimized for Spark.
Math-Scala Naive Bayes optimized for Spark.
Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label
DrmLike matrix; Output from seq2sparse in form K = e.g./Category/document_title V = TF or TF-IDF values per term
a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
(labelIndexMap, aggregatedByLabelObservationDrm) labelIndexMap is a HashMap K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label
Test a trained model with a labeled dataset
Test a trained model with a labeled dataset
implicitly determined Key type of test set DRM: String
a trained NBModel
a labeled testing set
test using a complementary or a standard NB classifier
a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
a result analyzer with confusion matrix and accuracy statistics
Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf