KMeansDriver (Mahout Map-Reduce 0.13.0 API)

java.lang.Object
- org.apache.hadoop.conf.Configured
- - org.apache.mahout.common.AbstractJob
  - - org.apache.mahout.clustering.kmeans.KMeansDriver

All Implemented Interfaces:

org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
```
public class KMeansDriver
extends AbstractJob
```

Field Summary
- Fields inherited from class org.apache.mahout.common.AbstractJob
  argMap, inputFile, inputPath, outputFile, outputPath, tempPath

Constructor Summary

Constructors
Constructor and Description

KMeansDriver()

Constructors
Constructor and Description
`KMeansDriver()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static org.apache.hadoop.fs.Path`	`buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, int maxIterations, String delta, boolean runSequential)` Iterate over the input vectors to produce cluster directories for each iteration
`static void`	`clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double clusterClassificationThreshold, boolean runSequential)` Run the job using supplied arguments
`static void`	`main(String[] args)`
`static void`	`run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, boolean runClustering, double clusterClassificationThreshold, boolean runSequential)` Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
`static void`	`run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, boolean runClustering, double clusterClassificationThreshold, boolean runSequential)` Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
`int`	`run(String[] args)`

Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- KMeansDriver
```
public KMeansDriver()
```

Method Detail

main

public static void main(String[] args)
                 throws Exception

Throws:: Exception

run

public int run(String[] args)
        throws Exception

Throws:: Exception

run
```
public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       double convergenceDelta,
                       int maxIterations,
                       boolean runClustering,
                       double clusterClassificationThreshold,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
```
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:

input - the directory pathname for input points

clustersIn - the directory pathname for initial & computed clusters

output - the directory pathname for output points

convergenceDelta - the convergence delta value

maxIterations - the maximum number of iterations

runClustering - true if points are to be clustered after iterations are completed

clusterClassificationThreshold - Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors having pdf below this value will not be clustered.

runSequential - if true execute sequential algorithm

Throws:

IOException

InterruptedException

ClassNotFoundException

run
```
public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       double convergenceDelta,
                       int maxIterations,
                       boolean runClustering,
                       double clusterClassificationThreshold,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
```
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:

input - the directory pathname for input points

clustersIn - the directory pathname for initial & computed clusters

output - the directory pathname for output points

convergenceDelta - the convergence delta value

maxIterations - the maximum number of iterations

runClustering - true if points are to be clustered after iterations are completed

clusterClassificationThreshold - Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors having pdf below this value will not be clustered.

runSequential - if true execute sequential algorithm

Throws:

IOException

InterruptedException

ClassNotFoundException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path clustersIn,
                                                      org.apache.hadoop.fs.Path output,
                                                      int maxIterations,
                                                      String delta,
                                                      boolean runSequential)
                                               throws IOException,
                                                      InterruptedException,
                                                      ClassNotFoundException

Iterate over the input vectors to produce cluster directories for each iteration

Parameters:: conf - the Configuration to use; input - the directory pathname for input points; clustersIn - the directory pathname for initial & computed clusters; output - the directory pathname for output points; maxIterations - the maximum number of iterations; delta - the convergence delta value; runSequential - if true execute sequential algorithm
Returns:: the Path of the final clusters directory
Throws:: IOException; InterruptedException; ClassNotFoundException

clusterData

public static void clusterData(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path input,
                               org.apache.hadoop.fs.Path clustersIn,
                               org.apache.hadoop.fs.Path output,
                               double clusterClassificationThreshold,
                               boolean runSequential)
                        throws IOException,
                               InterruptedException,
                               ClassNotFoundException

Run the job using supplied arguments

Parameters:: input - the directory pathname for input points; clustersIn - the directory pathname for input clusters; output - the directory pathname for output points; clusterClassificationThreshold - Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors having pdf below this value will not be clustered.; runSequential - if true execute sequential algorithm
Throws:: IOException; InterruptedException; ClassNotFoundException

Class KMeansDriver

Field Summary

Fields inherited from class org.apache.mahout.common.AbstractJob

Constructor Summary

Method Summary

Methods inherited from class org.apache.mahout.common.AbstractJob

Methods inherited from class java.lang.Object

Constructor Detail

KMeansDriver

Method Detail

main

run

run

run

buildClusters

clusterData