argmax with values as well returns a tuple of index of the max score and the score itself.
argmax with values as well returns a tuple of index of the max score and the score itself.
Vector of of scores
(bestIndex, bestScore)
default value for the Laplacian smoothing parameter
Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize
Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize
DrmLike matrix; Output from seq2sparse in form K = eg./Category/document_title V = TF or TF-IDF values per term
a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
(labelIndexMap,aggregatedByLabelObservationDrm) labelIndexMap is a HashMap [String, Integer] K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label
Default: seqdirectory/seq2Sparse Categories are Stored in Drm Keys as: /Category/document_id
Test a trained model with a labeled dataset sequentially
Test a trained model with a labeled dataset sequentially
implicitly determined Key type of test set DRM: String
a trained NBModel
a labeled testing set
test using a complementary or a standard NB classifier
a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
*Note*: this method brings the entire test set into upfront memory, This method is optimized and parallelized in SparkNaiveBayes
a result analyzer with confusion matrix and accuracy statistics
Distributed training of a Naive Bayes model.
Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
a DrmLike[Int] matrix containing term frequency counts for each label.
whether or not to train a complementary Naive Bayes model
Laplace smoothing parameter
trained naive bayes model
Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf