public class KMeansDriver extends AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
Constructor and Description |
---|
KMeansDriver() |
Modifier and Type | Method and Description |
---|---|
static org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
int maxIterations,
String delta,
boolean runSequential)
Iterate over the input vectors to produce cluster directories for each iteration
|
static void |
clusterData(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
double clusterClassificationThreshold,
boolean runSequential)
Run the job using supplied arguments
|
static void |
main(String[] args) |
static void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
double convergenceDelta,
int maxIterations,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to
cluster the input vectors.
|
static void |
run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
double convergenceDelta,
int maxIterations,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to
cluster the input vectors.
|
int |
run(String[] args) |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
public static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, boolean runClustering, double clusterClassificationThreshold, boolean runSequential) throws IOException, InterruptedException, ClassNotFoundException
input
- the directory pathname for input pointsclustersIn
- the directory pathname for initial & computed clustersoutput
- the directory pathname for output pointsconvergenceDelta
- the convergence delta valuemaxIterations
- the maximum number of iterationsrunClustering
- true if points are to be clustered after iterations are completedclusterClassificationThreshold
- Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors
having pdf below this value will not be clustered.runSequential
- if true execute sequential algorithmIOException
InterruptedException
ClassNotFoundException
public static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, boolean runClustering, double clusterClassificationThreshold, boolean runSequential) throws IOException, InterruptedException, ClassNotFoundException
input
- the directory pathname for input pointsclustersIn
- the directory pathname for initial & computed clustersoutput
- the directory pathname for output pointsconvergenceDelta
- the convergence delta valuemaxIterations
- the maximum number of iterationsrunClustering
- true if points are to be clustered after iterations are completedclusterClassificationThreshold
- Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors
having pdf below this value will not be clustered.runSequential
- if true execute sequential algorithmIOException
InterruptedException
ClassNotFoundException
public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, int maxIterations, String delta, boolean runSequential) throws IOException, InterruptedException, ClassNotFoundException
conf
- the Configuration to useinput
- the directory pathname for input pointsclustersIn
- the directory pathname for initial & computed clustersoutput
- the directory pathname for output pointsmaxIterations
- the maximum number of iterationsdelta
- the convergence delta valuerunSequential
- if true execute sequential algorithmIOException
InterruptedException
ClassNotFoundException
public static void clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double clusterClassificationThreshold, boolean runSequential) throws IOException, InterruptedException, ClassNotFoundException
input
- the directory pathname for input pointsclustersIn
- the directory pathname for input clustersoutput
- the directory pathname for output pointsclusterClassificationThreshold
- Is a clustering strictness / outlier removal parameter. Its value should be between 0 and 1. Vectors
having pdf below this value will not be clustered.runSequential
- if true execute sequential algorithmIOException
InterruptedException
ClassNotFoundException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.