AbstractVectorClassifier (Mahout Map-Reduce 0.13.0 API)

java.lang.Object
- org.apache.mahout.classifier.AbstractVectorClassifier

Direct Known Subclasses:

AbstractNaiveBayesClassifier, AbstractOnlineLogisticRegression, ClusterClassifier, CrossFoldLearner, GradientMachine, PassiveAggressive
```
public abstract class AbstractVectorClassifier
extends Object
```
Defines the interface for classifiers that take a vector as input. This is implemented as an abstract class so that it can implement a number of handy convenience methods related to classification of vectors.
A classifier takes an input vector and calculates the scores (usually probabilities) that the input vector belongs to one of n categories. In AbstractVectorClassifier each category is denoted by an integer c between 0 and n-1 (inclusive).
New users should start by looking at classifyFull(org.apache.mahout.math.Vector) (not classify(org.apache.mahout.math.Vector)).

Field Summary

Fields
Modifier and Type Field and Description

static double MIN_LOG_LIKELIHOOD
Minimum allowable log likelihood value.

Fields
Modifier and Type	Field and Description
`static double`	`MIN_LOG_LIKELIHOOD` Minimum allowable log likelihood value.

Constructor Summary

Constructors
Constructor and Description

AbstractVectorClassifier()

Constructors
Constructor and Description
`AbstractVectorClassifier()`

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`Matrix`	`classify(Matrix data)` Returns n-1 probabilities, one for each categories 1 through `n-1`, for each row of a matrix, where `n` is equal to `numCategories()`.
`abstract Vector`	`classify(Vector instance)` Compute and return a vector containing `n-1` scores, where `n` is equal to `numCategories()`, given an input vector `instance`.
`Matrix`	`classifyFull(Matrix data)` Returns a matrix where the rows of the matrix each contain `n` probabilities, one for each category.
`Vector`	`classifyFull(Vector instance)` Computes and returns a vector containing `n` scores, where `n` is `numCategories()`, given an input vector `instance`.
`Vector`	`classifyFull(Vector r, Vector instance)` Computes and returns a vector containing `n` scores, where `n` is `numCategories()`, given an input vector `instance`.
`Vector`	`classifyNoLink(Vector features)` Compute and return a vector of scores before applying the inverse link function.
`Vector`	`classifyScalar(Matrix data)` Returns a vector of probabilities of category 1, one for each row of a matrix.
`abstract double`	`classifyScalar(Vector instance)` Classifies a vector in the special case of a binary classifier where `classify(Vector)` would return a vector with only one element.
`double`	`logLikelihood(int actual, Vector data)` Returns a measure of how good the classification for a particular example actually is.
`abstract int`	`numCategories()` Returns the number of categories that a target variable can be assigned to.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - MIN_LOG_LIKELIHOOD
```
public static final double MIN_LOG_LIKELIHOOD
```
    Minimum allowable log likelihood value.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - AbstractVectorClassifier
```
public AbstractVectorClassifier()
```
- Method Detail
  - numCategories
```
public abstract int numCategories()
```
    Returns the number of categories that a target variable can be assigned to. A vector classifier will encode it's output as an integer from 0 to numCategories()-1 (inclusive).
    
    Returns:
    
    The number of categories.
  - classify
```
public abstract Vector classify(Vector instance)
```
    Compute and return a vector containing n-1 scores, where n is equal to numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to that category. The categories are denoted by the integers 0 through n-1 (inclusive), and the scores in the returned vector correspond to categories 1 through n-1 (leaving out category 0). It is assumed that the score for category 0 is one minus the sum of the scores in the returned vector.
    
    Parameters:
    
    instance - A feature vector to be classified.
    
    Returns:
    
    A vector of probabilities in 1 of n-1 encoding.
  - classifyNoLink
```
public Vector classifyNoLink(Vector features)
```
    Compute and return a vector of scores before applying the inverse link function. For logistic regression and other generalized linear models, this is just the linear part of the classification.
    The implementation of this method provided by AbstractVectorClassifier throws an UnsupportedOperationException. Your subclass must explicitly override this method to support this operation.
    
    Parameters:
    
    features - A feature vector to be classified.
    
    Returns:
    
    A vector of scores. If transformed by the link function, these will become probabilities.
  - classifyScalar
```
public abstract double classifyScalar(Vector instance)
```
    Classifies a vector in the special case of a binary classifier where classify(Vector) would return a vector with only one element. As such, using this method can avoid the allocation of a vector.
    
    Parameters:
    
    instance - The feature vector to be classified.
    
    Returns:
    
    The score for category 1.
    
    See Also:
    
    classify(Vector)
  - classifyFull
```
public Vector classifyFull(Vector instance)
```
    Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to the corresponding category. The categories are denoted by the integers 0 through n-1 (inclusive).
    Using this method it is possible to classify an input vector, for example, by selecting the category with the largest score. If classifier is an instance of AbstractVectorClassifier and input is a Vector of features describing an element to be classified, then the following code could be used to classify input.
    Vector scores = classifier.classifyFull(input);<br> int assignedCategory = scores.maxValueIndex();<br> Here assignedCategory is the index of the category with the maximum score.
    If an n-1 encoding is acceptable, and allocation performance is an issue, then the classify(Vector) method is probably better to use.
    
    Parameters:
    
    instance - A vector of features to be classified.
    
    Returns:
    
    A vector of probabilities, one for each category.
    
    See Also:
    
    classify(Vector), classifyFull(Vector r, Vector instance)
  - classifyFull
```
public Vector classifyFull(Vector r,
                           Vector instance)
```
    Computes and returns a vector containing n scores, where n is numCategories(), given an input vector instance. Higher scores indicate that the input vector is more likely to belong to the corresponding category. The categories are denoted by the integers 0 through n-1 (inclusive). The main difference between this method and classifyFull(Vector) is that this method allows a user to provide a previously allocated Vector r to store the returned scores.
    Using this method it is possible to classify an input vector, for example, by selecting the category with the largest score. If classifier is an instance of AbstractVectorClassifier, result is a non-null Vector, and input is a Vector of features describing an element to be classified, then the following code could be used to classify input.
    Vector scores = classifier.classifyFull(result, input); // Notice that scores == result<br> int assignedCategory = scores.maxValueIndex();<br> Here assignedCategory is the index of the category with the maximum score.
    
    Parameters:
    
    r - Where to put the results.
    
    instance - A vector of features to be classified.
    
    Returns:
    
    A vector of scores/probabilities, one for each category.
  - classify
```
public Matrix classify(Matrix data)
```
    Returns n-1 probabilities, one for each categories 1 through n-1, for each row of a matrix, where n is equal to numCategories(). The probability of the missing 0-th category is 1 - rowSum(this result).
    
    Parameters:
    
    data - The matrix whose rows are the input vectors to classify
    
    Returns:
    
    A matrix of scores, one row per row of the input matrix, one column for each but the last category.
  - classifyFull
```
public Matrix classifyFull(Matrix data)
```
    Returns a matrix where the rows of the matrix each contain n probabilities, one for each category.
    
    Parameters:
    
    data - The matrix whose rows are the input vectors to classify
    
    Returns:
    
    A matrix of scores, one row per row of the input matrix, one column for each but the last category.
  - classifyScalar
```
public Vector classifyScalar(Matrix data)
```
    Returns a vector of probabilities of category 1, one for each row of a matrix. This only makes sense if there are exactly two categories, but calling this method in that case can save a number of vector allocations.
    
    Parameters:
    
    data - The matrix whose rows are vectors to classify
    
    Returns:
    
    A vector of scores, with one value per row of the input matrix.
  - logLikelihood
```
public double logLikelihood(int actual,
                            Vector data)
```
    Returns a measure of how good the classification for a particular example actually is.
    
    Parameters:
    
    actual - The correct category for the example.
    
    data - The vector to be classified.
    
    Returns:
    
    The log likelihood of the correct answer as estimated by the current model. This will always be <= 0 and larger (closer to 0) indicates better accuracy. In order to simplify code that maintains eunning averages, we bound this value at -100.

Class AbstractVectorClassifier

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

MIN_LOG_LIKELIHOOD

Constructor Detail

AbstractVectorClassifier

Method Detail

numCategories

classify

classifyNoLink

classifyScalar

classifyFull

classifyFull

classify

classifyFull

classifyScalar

logLikelihood