NaiveBayes

Type Members

type CategoryParser = (String) ⇒ String

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
def argmax(v: Vector): (Int, Double)

argmax with values as well returns a tuple of index of the max score and the score itself.
argmax with values as well returns a tuple of index of the max score and the score itself.
v
Vector of of scores
returns
(bestIndex, bestScore)
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def defaultAlphaI: Float

default value for the Laplacian smoothing parameter
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def extractLabelsAndAggregateObservations[K](stringKeyedObservations: DrmLike[K], cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit ctx: DistributedContext): (HashMap[String, Integer], DrmLike[Int])

Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize
Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label Override this method in engine specific modules to optimize
stringKeyedObservations
DrmLike matrix; Output from seq2sparse in form K = eg./Category/document_title V = TF or TF-IDF values per term
cParser
a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
returns
(labelIndexMap,aggregatedByLabelObservationDrm) labelIndexMap is a HashMap [String, Integer] K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def seq2SparseCategoryParser: (String) ⇒ String

Default: seqdirectory/seq2Sparse Categories are Stored in Drm Keys as: /Category/document_id
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def test[K](model: NBModel, testSet: DrmLike[K], testComplementary: Boolean = false, cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): ResultAnalyzer

Test a trained model with a labeled dataset sequentially
Test a trained model with a labeled dataset sequentially
K
implicitly determined Key type of test set DRM: String
model
a trained NBModel
testSet
a labeled testing set
testComplementary
test using a complementary or a standard NB classifier
cParser
a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'
*Note*: this method brings the entire test set into upfront memory, This method is optimized and parallelized in SparkNaiveBayes
returns
a result analyzer with confusion matrix and accuracy statistics
def toString(): String

Definition Classes
AnyRef → Any
def train(observationsPerLabel: DrmLike[Int], labelIndex: Map[String, Integer], trainComplementary: Boolean = true, alphaI: Float = defaultAlphaI): NBModel

Distributed training of a Naive Bayes model.
Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
observationsPerLabel
a DrmLike[Int] matrix containing term frequency counts for each label.
trainComplementary
whether or not to train a complementary Naive Bayes model
alphaI
Laplace smoothing parameter
returns
trained naive bayes model
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

trait NaiveBayes extends Serializable

Type Members

type CategoryParser = (String) ⇒ String

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

def argmax(v: Vector): (Int, Double)

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def defaultAlphaI: Float

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def extractLabelsAndAggregateObservations[K](stringKeyedObservations: DrmLike[K], cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit ctx: DistributedContext): (HashMap[String, Integer], DrmLike[Int])

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def seq2SparseCategoryParser: (String) ⇒ String

final def synchronized[T0](arg0: ⇒ T0): T0

def test[K](model: NBModel, testSet: DrmLike[K], testComplementary: Boolean = false, cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): ResultAnalyzer

def toString(): String

def train(observationsPerLabel: DrmLike[Int], labelIndex: Map[String, Integer], trainComplementary: Boolean = true, alphaI: Float = defaultAlphaI): NBModel

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped