public abstract class AbstractVectorClassifier extends Object
A classifier takes an input vector and calculates the scores (usually
probabilities) that the input vector belongs to one of n
categories. In AbstractVectorClassifier
each category is denoted
by an integer c
between 0
and n-1
(inclusive).
New users should start by looking at classifyFull(org.apache.mahout.math.Vector)
(not classify(org.apache.mahout.math.Vector)
).
Modifier and Type | Field and Description |
---|---|
static double |
MIN_LOG_LIKELIHOOD
Minimum allowable log likelihood value.
|
Constructor and Description |
---|
AbstractVectorClassifier() |
Modifier and Type | Method and Description |
---|---|
Matrix |
classify(Matrix data)
Returns n-1 probabilities, one for each categories 1 through
n-1 , for each row of a matrix, where n is equal
to numCategories() . |
abstract Vector |
classify(Vector instance)
Compute and return a vector containing
n-1 scores, where
n is equal to numCategories() , given an input
vector instance . |
Matrix |
classifyFull(Matrix data)
Returns a matrix where the rows of the matrix each contain
n probabilities, one for each category. |
Vector |
classifyFull(Vector instance)
Computes and returns a vector containing
n scores, where
n is numCategories() , given an input vector
instance . |
Vector |
classifyFull(Vector r,
Vector instance)
Computes and returns a vector containing
n scores, where
n is numCategories() , given an input vector
instance . |
Vector |
classifyNoLink(Vector features)
Compute and return a vector of scores before applying the inverse link
function.
|
Vector |
classifyScalar(Matrix data)
Returns a vector of probabilities of category 1, one for each row
of a matrix.
|
abstract double |
classifyScalar(Vector instance)
Classifies a vector in the special case of a binary classifier where
classify(Vector) would return a vector with only one element. |
double |
logLikelihood(int actual,
Vector data)
Returns a measure of how good the classification for a particular example
actually is.
|
abstract int |
numCategories()
Returns the number of categories that a target variable can be assigned to.
|
public static final double MIN_LOG_LIKELIHOOD
public abstract int numCategories()
0
to numCategories()-1
(inclusive).public abstract Vector classify(Vector instance)
n-1
scores, where
n
is equal to numCategories()
, given an input
vector instance
. Higher scores indicate that the input vector
is more likely to belong to that category. The categories are denoted by
the integers 0
through n-1
(inclusive), and the
scores in the returned vector correspond to categories 1 through
n-1
(leaving out category 0). It is assumed that the score for
category 0 is one minus the sum of the scores in the returned vector.instance
- A feature vector to be classified.n-1
encoding.public Vector classifyNoLink(Vector features)
The implementation of this method provided by AbstractVectorClassifier
throws an
UnsupportedOperationException
. Your subclass must explicitly override this method to support
this operation.
features
- A feature vector to be classified.public abstract double classifyScalar(Vector instance)
classify(Vector)
would return a vector with only one element. As
such, using this method can avoid the allocation of a vector.instance
- The feature vector to be classified.classify(Vector)
public Vector classifyFull(Vector instance)
n
scores, where
n
is numCategories()
, given an input vector
instance
. Higher scores indicate that the input vector is more
likely to belong to the corresponding category. The categories are denoted
by the integers 0
through n-1
(inclusive).
Using this method it is possible to classify an input vector, for example,
by selecting the category with the largest score. If
classifier
is an instance of
AbstractVectorClassifier
and input
is a
Vector
of features describing an element to be classified,
then the following code could be used to classify input
.
Vector scores = classifier.classifyFull(input);<br>
int assignedCategory = scores.maxValueIndex();<br>
Here assignedCategory
is the index of the category
with the maximum score.
If an n-1
encoding is acceptable, and allocation performance
is an issue, then the classify(Vector)
method is probably better
to use.
instance
- A vector of features to be classified.classify(Vector)
,
classifyFull(Vector r, Vector instance)
public Vector classifyFull(Vector r, Vector instance)
n
scores, where
n
is numCategories()
, given an input vector
instance
. Higher scores indicate that the input vector is more
likely to belong to the corresponding category. The categories are denoted
by the integers 0
through n-1
(inclusive). The
main difference between this method and classifyFull(Vector)
is
that this method allows a user to provide a previously allocated
Vector r
to store the returned scores.
Using this method it is possible to classify an input vector, for example,
by selecting the category with the largest score. If
classifier
is an instance of
AbstractVectorClassifier
, result
is a non-null
Vector
, and input
is a Vector
of
features describing an element to be classified, then the following code
could be used to classify input
.
Vector scores = classifier.classifyFull(result, input); // Notice that scores == result<br>
int assignedCategory = scores.maxValueIndex();<br>
Here assignedCategory
is the index of the category
with the maximum score.
r
- Where to put the results.instance
- A vector of features to be classified.public Matrix classify(Matrix data)
n-1
, for each row of a matrix, where n
is equal
to numCategories()
. The probability of the missing 0-th
category is 1 - rowSum(this result).data
- The matrix whose rows are the input vectors to classifypublic Matrix classifyFull(Matrix data)
n
probabilities, one for each category.data
- The matrix whose rows are the input vectors to classifypublic Vector classifyScalar(Matrix data)
data
- The matrix whose rows are vectors to classifypublic double logLikelihood(int actual, Vector data)
actual
- The correct category for the example.data
- The vector to be classified.Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.