AbstractJob (Mahout Map-Reduce 0.13.0 API)

java.lang.Object
- org.apache.hadoop.conf.Configured
- - org.apache.mahout.common.AbstractJob

All Implemented Interfaces:

org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

Direct Known Subclasses:

CanopyDriver, ClusterClassificationDriver, ClusterOutputPostProcessorDriver, CollocDriver, CVB0Driver, DatasetSplitter, DictionaryVectorizer, DistributedConjugateGradientSolver.DistributedConjugateGradientSolverJob, DistributedLanczosSolver.DistributedLanczosSolverJob, EigenVerificationJob, EncodedVectorsFromSequenceFiles, FactorizationEvaluator, FuzzyKMeansDriver, InMemoryCollapsedVariationalBayes0, ItemSimilarityJob, KMeansDriver, MatrixMultiplicationJob, ParallelALSFactorizationJob, PreparePreferenceMatrixJob, RecommenderJob, RecommenderJob, RowSimilarityJob, SparseVectorsFromSequenceFiles, SpectralKMeansDriver, SSVDCli, StreamingKMeansDriver, TestNaiveBayesDriver, TrainNaiveBayesJob, TransposeJob, VectorDistanceSimilarityJob
```
public abstract class AbstractJob
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
```
Superclass of many Mahout Hadoop "jobs". A job drives configuration and launch of one or more maps and reduces in order to accomplish some task.

Command line arguments available to all subclasses are:
- --tempDir (path): Specifies a directory where the job may place temp files (default "temp")
- --help: Show help message
In addition, note some key command line parameters that are parsed by Hadoop, which jobs may need to set:
- -Dmapred.job.name=(name): Sets the Hadoop task names. It will be suffixed by the mapper and reducer class names
- -Dmapred.output.compress={true,false}: Compress final output (default true)
- -Dmapred.input.dir=(path): input file, or directory containing input files (required)
- -Dmapred.output.dir=(path): path to write output files (required)
Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other arguments.

Field Summary

Fields
Modifier and Type	Field and Description
`protected Map<String,List<String>>`	`argMap`
`protected File`	`inputFile`
`protected org.apache.hadoop.fs.Path`	`inputPath` input path, populated by `parseArguments(String[])`
`protected File`	`outputFile`
`protected org.apache.hadoop.fs.Path`	`outputPath` output path, populated by `parseArguments(String[])`
`protected org.apache.hadoop.fs.Path`	`tempPath` temp path, populated by `parseArguments(String[])`

Constructor Summary

Constructors
Modifier Constructor and Description

protected AbstractJob()

Constructors
Modifier	Constructor and Description
`protected`	`AbstractJob()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`addFlag(String name, String shortName, String description)` Add an option with no argument whose presence can be checked for using `containsKey` method on the map returned by `parseArguments(String[])`;
`protected void`	`addInputOption()` Add the default input directory option, '-i' which takes a directory name as an argument.
`protected org.apache.commons.cli2.Option`	`addOption(org.apache.commons.cli2.Option option)` Add an arbitrary option to the set of options this job will parse when `parseArguments(String[])` is called.
`protected void`	`addOption(String name, String shortName, String description)` Add an option to the the set of options this job will parse when `parseArguments(String[])` is called.
`protected void`	`addOption(String name, String shortName, String description, boolean required)` Add an option to the the set of options this job will parse when `parseArguments(String[])` is called.
`protected void`	`addOption(String name, String shortName, String description, String defaultValue)` Add an option to the the set of options this job will parse when `parseArguments(String[])` is called.
`protected void`	`addOutputOption()` Add the default output directory option, '-o' which takes a directory name as an argument.
`protected static org.apache.commons.cli2.Option`	`buildOption(String name, String shortName, String description, boolean hasArg, boolean required, String defaultValue)` Build an option with the given parameters.
`protected static org.apache.commons.cli2.Option`	`buildOption(String name, String shortName, String description, boolean hasArg, int min, int max, boolean required, String defaultValue)`
`protected Class<? extends org.apache.lucene.analysis.Analyzer>`	`getAnalyzerClassFromOption()`
`protected org.apache.commons.cli2.Option`	`getCLIOption(String name)`
`org.apache.hadoop.conf.Configuration`	`getConf()`
`int`	`getDimensions(org.apache.hadoop.fs.Path matrix)` Get the cardinality of the input vectors
`float`	`getFloat(String optionName)`
`float`	`getFloat(String optionName, float defaultVal)`
`protected org.apache.commons.cli2.Group`	`getGroup()`
`protected File`	`getInputFile()`
`protected org.apache.hadoop.fs.Path`	`getInputPath()` Returns the input path established by a call to `parseArguments(String[])`.
`int`	`getInt(String optionName)`
`int`	`getInt(String optionName, int defaultVal)`
`static String`	`getOption(Map<String,List<String>> args, String optName)`
`String`	`getOption(String optionName)`
`String`	`getOption(String optionName, String defaultVal)` Get the option, else the default
`List<String>`	`getOptions(String optionName)` Options can occur multiple times, so return the list
`protected File`	`getOutputFile()`
`protected org.apache.hadoop.fs.Path`	`getOutputPath()` Returns the output path established by a call to `parseArguments(String[])`.
`protected org.apache.hadoop.fs.Path`	`getOutputPath(String path)`
`protected org.apache.hadoop.fs.Path`	`getTempPath()`
`protected org.apache.hadoop.fs.Path`	`getTempPath(String directory)`
`boolean`	`hasOption(String optionName)`
`static String`	`keyFor(String optionName)` Build the option key (--name) from the option name
`protected static void`	`maybePut(Map<String,List<String>> args, org.apache.commons.cli2.CommandLine cmdLine, org.apache.commons.cli2.Option... opt)`
`Map<String,List<String>>`	`parseArguments(String[] args)` Parse the arguments specified based on the options defined using the various `addOption` methods.
`Map<String,List<String>>`	`parseArguments(String[] args, boolean inputOptional, boolean outputOptional)`
`protected void`	`parseDirectories(org.apache.commons.cli2.CommandLine cmdLine, boolean inputOptional, boolean outputOptional)` Obtain input and output directories from command-line options or hadoop properties.
`protected org.apache.hadoop.mapreduce.Job`	`prepareJob(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat, Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper, Class<? extends org.apache.hadoop.io.Writable> mapperKey, Class<? extends org.apache.hadoop.io.Writable> mapperValue, Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)`
`protected org.apache.hadoop.mapreduce.Job`	`prepareJob(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat, Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper, Class<? extends org.apache.hadoop.io.Writable> mapperKey, Class<? extends org.apache.hadoop.io.Writable> mapperValue, Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat, String jobname)`
`protected org.apache.hadoop.mapreduce.Job`	prepareJob(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat, Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper, Class<? extends org.apache.hadoop.io.Writable> mapperKey, Class<? extends org.apache.hadoop.io.Writable> mapperValue, Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer, Class<? extends org.apache.hadoop.io.Writable> reducerKey, Class<? extends org.apache.hadoop.io.Writable> reducerValue, Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)
`protected org.apache.hadoop.mapreduce.Job`	`prepareJob(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper, Class<? extends org.apache.hadoop.io.Writable> mapperKey, Class<? extends org.apache.hadoop.io.Writable> mapperValue, Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer, Class<? extends org.apache.hadoop.io.Writable> reducerKey, Class<? extends org.apache.hadoop.io.Writable> reducerValue)`
`void`	`setConf(org.apache.hadoop.conf.Configuration conf)` Overrides the base implementation to install the Oozie action configuration resource into the provided Configuration object; note that ToolRunner calls setConf on the Tool before it invokes run.
`static void`	`setS3SafeCombinedInputPath(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.fs.Path referencePath, org.apache.hadoop.fs.Path inputPathOne, org.apache.hadoop.fs.Path inputPathTwo)` necessary to make this job (having a combined input path) work on Amazon S3, hopefully this is obsolete when MultipleInputs is available again
`protected static boolean`	`shouldRunNextPhase(Map<String,List<String>> args, AtomicInteger currentPhase)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.hadoop.util.Tool
run

Field Detail

inputPath
```
protected org.apache.hadoop.fs.Path inputPath
```
input path, populated by parseArguments(String[])

inputFile
```
protected File inputFile
```

outputPath
```
protected org.apache.hadoop.fs.Path outputPath
```
output path, populated by parseArguments(String[])

outputFile
```
protected File outputFile
```

tempPath
```
protected org.apache.hadoop.fs.Path tempPath
```
temp path, populated by parseArguments(String[])

argMap

protected Map<String,List<String>> argMap

Constructor Detail
- AbstractJob
```
protected AbstractJob()
```

Method Detail

getInputPath
```
protected org.apache.hadoop.fs.Path getInputPath()
```
Returns the input path established by a call to parseArguments(String[]). The source of the path may be an input option added using addInputOption() or it may be the value of the mapred.input.dir configuration property.

getOutputPath
```
protected org.apache.hadoop.fs.Path getOutputPath()
```
Returns the output path established by a call to parseArguments(String[]). The source of the path may be an output option added using addOutputOption() or it may be the value of the mapred.input.dir configuration property.

getOutputPath

protected org.apache.hadoop.fs.Path getOutputPath(String path)

getInputFile
```
protected File getInputFile()
```

getOutputFile
```
protected File getOutputFile()
```

getTempPath

protected org.apache.hadoop.fs.Path getTempPath()

getTempPath

protected org.apache.hadoop.fs.Path getTempPath(String directory)

getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```
Specified by:

getConf in interface org.apache.hadoop.conf.Configurable

Overrides:

getConf in class org.apache.hadoop.conf.Configured

addFlag

protected void addFlag(String name,
                       String shortName,
                       String description)

Add an option with no argument whose presence can be checked for using containsKey method on the map returned by parseArguments(String[]);

addOption

protected void addOption(String name,
                         String shortName,
                         String description)

Add an option to the the set of options this job will parse when parseArguments(String[]) is called. This options has an argument with null as its default value.

addOption
```
protected void addOption(String name,
                         String shortName,
                         String description,
                         boolean required)
```
Add an option to the the set of options this job will parse when parseArguments(String[]) is called.

Parameters:

required - if true the parseArguments(String[]) will throw fail with an error and usage message if this option is not specified on the command line.

addOption
```
protected void addOption(String name,
                         String shortName,
                         String description,
                         String defaultValue)
```
Add an option to the the set of options this job will parse when parseArguments(String[]) is called. If this option is not specified on the command line the default value will be used.

Parameters:

defaultValue - the default argument value if this argument is not found on the command-line. null is allowed.

addOption
```
protected org.apache.commons.cli2.Option addOption(org.apache.commons.cli2.Option option)
```
Add an arbitrary option to the set of options this job will parse when parseArguments(String[]) is called. If this option has no argument, use containsKey on the map returned by parseArguments to check for its presence. Otherwise, the string value of the option will be placed in the map using a key equal to this options long name preceded by '--'.

Returns:

the option added.

getGroup

protected org.apache.commons.cli2.Group getGroup()

addInputOption
```
protected void addInputOption()
```
Add the default input directory option, '-i' which takes a directory name as an argument. When parseArguments(String[]) is called, the inputPath will be set based upon the value for this option. If this method is called, the input is required.

addOutputOption
```
protected void addOutputOption()
```
Add the default output directory option, '-o' which takes a directory name as an argument. When parseArguments(String[]) is called, the outputPath will be set based upon the value for this option. If this method is called, the output is required.

buildOption

protected static org.apache.commons.cli2.Option buildOption(String name,
                                                            String shortName,
                                                            String description,
                                                            boolean hasArg,
                                                            boolean required,
                                                            String defaultValue)

Build an option with the given parameters. Name and description are required.

Parameters:: name - the long name of the option prefixed with '--' on the command-line; shortName - the short name of the option, prefixed with '-' on the command-line; description - description of the option displayed in help method; hasArg - true if the option has an argument.; required - true if the option is required.; defaultValue - default argument value, can be null.
Returns:: the option.

buildOption

protected static org.apache.commons.cli2.Option buildOption(String name,
                                                            String shortName,
                                                            String description,
                                                            boolean hasArg,
                                                            int min,
                                                            int max,
                                                            boolean required,
                                                            String defaultValue)

getCLIOption
```
protected org.apache.commons.cli2.Option getCLIOption(String name)
```
Parameters:

name - The name of the option

Returns:

the Option with the name, else null

parseArguments
```
public Map<String,List<String>> parseArguments(String[] args)
                                        throws IOException
```
Parse the arguments specified based on the options defined using the various addOption methods. If -h is specified or an exception is encountered print help and return null. Has the side effect of setting inputPath and outputPath if addInputOption or addOutputOption or mapred.input.dir or mapred.output.dir are present in the Configuration.

Returns:

a Map<String,String> containing options and their argument values. The presence of a flag can be tested using containsKey, while argument values can be retrieved using get(optionName). The names used for keys are the option name parameter prefixed by '--'.

Throws:

IOException

See Also:

-- passes in false, false for the optional args.

parseArguments
```
public Map<String,List<String>> parseArguments(String[] args,
                                               boolean inputOptional,
                                               boolean outputOptional)
                                        throws IOException
```
Parameters:

args - The args to parse

inputOptional - if false, then the input option, if set, need not be present. If true and input is an option and there is no input, then throw an error

outputOptional - if false, then the output option, if set, need not be present. If true and output is an option and there is no output, then throw an error

Returns:

the args parsed into a map.

Throws:

IOException

keyFor
```
public static String keyFor(String optionName)
```
Build the option key (--name) from the option name

getOption
```
public String getOption(String optionName)
```
Returns:

the requested option, or null if it has not been specified

getOption
```
public String getOption(String optionName,
                        String defaultVal)
```
Get the option, else the default

Parameters:

optionName - The name of the option to look up, without the --

defaultVal - The default value.

Returns:

The requested option, else the default value if it doesn't exist

getInt
```
public int getInt(String optionName)
```

getInt

public int getInt(String optionName,
                  int defaultVal)

getFloat

public float getFloat(String optionName)

getFloat

public float getFloat(String optionName,
                      float defaultVal)

getOptions
```
public List<String> getOptions(String optionName)
```
Options can occur multiple times, so return the list

Parameters:

optionName - The unadorned (no "--" prefixing it) option name

Returns:

The values, else null. If the option is present, but has no values, then the result will be an empty list (Collections.emptyList())

hasOption
```
public boolean hasOption(String optionName)
```
Returns:

if the requested option has been specified

getDimensions
```
public int getDimensions(org.apache.hadoop.fs.Path matrix)
                  throws IOException
```
Get the cardinality of the input vectors

Parameters:

matrix -

Returns:

the cardinality of the vector

Throws:

IOException

parseDirectories
```
protected void parseDirectories(org.apache.commons.cli2.CommandLine cmdLine,
                                boolean inputOptional,
                                boolean outputOptional)
```
Obtain input and output directories from command-line options or hadoop properties. If addInputOption or addOutputOption has been called, this method will throw an OptionException if no source (command-line or property) for that value is present. Otherwise, inputPath or outputPath will be non-null only if specified as a hadoop property. Command-line options take precedence over hadoop properties.

Throws:

IllegalArgumentException - if either inputOption is present, and neither --input nor -Dmapred.input dir are specified or outputOption is present and neither --output nor -Dmapred.output.dir are specified.

maybePut

protected static void maybePut(Map<String,List<String>> args,
                               org.apache.commons.cli2.CommandLine cmdLine,
                               org.apache.commons.cli2.Option... opt)

getOption

public static String getOption(Map<String,List<String>> args,
                               String optName)

Parameters:: args - The input argument map; optName - The adorned (including "--") option name
Returns:: The first value in the match, else null

shouldRunNextPhase

protected static boolean shouldRunNextPhase(Map<String,List<String>> args,
                                            AtomicInteger currentPhase)

prepareJob

protected org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath,
                                                     org.apache.hadoop.fs.Path outputPath,
                                                     Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat,
                                                     Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperValue,
                                                     Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)
                                              throws IOException

Throws:: IOException

prepareJob

protected org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath,
                                                     org.apache.hadoop.fs.Path outputPath,
                                                     Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat,
                                                     Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperValue,
                                                     Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat,
                                                     String jobname)
                                              throws IOException

Throws:: IOException

prepareJob

protected org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath,
                                                     org.apache.hadoop.fs.Path outputPath,
                                                     Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperValue,
                                                     Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer,
                                                     Class<? extends org.apache.hadoop.io.Writable> reducerKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> reducerValue)
                                              throws IOException

Throws:: IOException

prepareJob

protected org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath,
                                                     org.apache.hadoop.fs.Path outputPath,
                                                     Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat,
                                                     Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> mapperValue,
                                                     Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer,
                                                     Class<? extends org.apache.hadoop.io.Writable> reducerKey,
                                                     Class<? extends org.apache.hadoop.io.Writable> reducerValue,
                                                     Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)
                                              throws IOException

Throws:: IOException

setS3SafeCombinedInputPath

public static void setS3SafeCombinedInputPath(org.apache.hadoop.mapreduce.Job job,
                                              org.apache.hadoop.fs.Path referencePath,
                                              org.apache.hadoop.fs.Path inputPathOne,
                                              org.apache.hadoop.fs.Path inputPathTwo)
                                       throws IOException

necessary to make this job (having a combined input path) work on Amazon S3, hopefully this is obsolete when MultipleInputs is available again

Throws:: IOException

getAnalyzerClassFromOption

protected Class<? extends org.apache.lucene.analysis.Analyzer> getAnalyzerClassFromOption()
                                                                                   throws ClassNotFoundException

Throws:: ClassNotFoundException

setConf
```
public void setConf(org.apache.hadoop.conf.Configuration conf)
```
Overrides the base implementation to install the Oozie action configuration resource into the provided Configuration object; note that ToolRunner calls setConf on the Tool before it invokes run.

Specified by:

setConf in interface org.apache.hadoop.conf.Configurable

Overrides:

setConf in class org.apache.hadoop.conf.Configured

Class AbstractJob

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.util.Tool

Field Detail

inputPath

inputFile

outputPath

outputFile

tempPath

argMap

Constructor Detail

AbstractJob

Method Detail

getInputPath

getOutputPath

getOutputPath

getInputFile

getOutputFile

getTempPath

getTempPath

getConf

addFlag

addOption

addOption

addOption

addOption

getGroup

addInputOption

addOutputOption

buildOption

buildOption

getCLIOption

parseArguments

parseArguments

keyFor

getOption

getOption

getInt

getInt

getFloat

getFloat

getOptions

hasOption

getDimensions

parseDirectories

maybePut

getOption

shouldRunNextPhase

prepareJob

prepareJob

prepareJob

prepareJob

setS3SafeCombinedInputPath

getAnalyzerClassFromOption

setConf