argmax with values as well returns a tuple of index of the max score and the score itself.
Vector of of scores
default value for the Laplacian smoothing parameter
Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize
DrmLike matrix; Output from seq2sparse in form K = eg./Category/document_title V = TF or TF-IDF values per term
a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
(labelIndexMap,aggregatedByLabelObservationDrm) labelIndexMap is a HashMap [String, Integer] K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label
Default: seqdirectory/seq2Sparse Categories are Stored in Drm Keys as: /Category/document_id
Test a trained model with a labeled dataset sequentially
implicitly determined Key type of test set DRM: String
a trained NBModel
a labeled testing set
test using a complementary or a standard NB classifier
a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
*Note*: this method brings the entire test set into upfront memory, This method is optimized and parallelized in SparkNaiveBayes
a result analyzer with confusion matrix and accuracy statistics
Distributed training of a Naive Bayes model.
a DrmLike[Int] matrix containing term frequency counts for each label.
whether or not to train a complementary Naive Bayes model
Laplace smoothing parameter
trained naive bayes model