public final class SplitInputJob extends Object
Modifier and Type | Class and Description |
---|---|
static class |
SplitInputJob.SplitInputComparator
Randomly permute key value pairs
|
static class |
SplitInputJob.SplitInputMapper
Mapper which downsamples the input by downsamplingFactor
|
static class |
SplitInputJob.SplitInputReducer
Reducer which uses MultipleOutputs to randomly allocate key value pairs between test and training outputs
|
Modifier and Type | Method and Description |
---|---|
static void |
run(org.apache.hadoop.conf.Configuration initialConf,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
int keepPct,
float randomSelectionPercent)
Run job to downsample, randomly permute and split data into test and
training sets.
|
public static void run(org.apache.hadoop.conf.Configuration initialConf, org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, int keepPct, float randomSelectionPercent) throws IOException, ClassNotFoundException, InterruptedException
initialConf
- Initial configurationinputPath
- path to input data SequenceFileoutputPath
- path for output data SequenceFileskeepPct
- percentage of key value pairs in input to keep. The rest are
discardedrandomSelectionPercent
- percentage of key value pairs to allocate to test set. Remainder
are allocated to training setIOException
ClassNotFoundException
InterruptedException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.