@Deprecated public class EigenVerificationJob extends AbstractJob
Class for taking the output of an eigendecomposition (specified as a Path location), and verifies correctness, in terms of the following: if you have a vector e, and a matrix m, then let e' = m.timesSquared(v); the error w.r.t. eigenvector-ness is the cosine of the angle between e and e':
error(e,e') = e.dot(e') / (e.norm(2)*e'.norm(2))
A set of eigenvectors should also all be very close to orthogonal, so this job computes all inner products between eigenvectors, and checks that this is close to the identity matrix.
Parameters used in the cleanup (other than in the input/output path options) include --minEigenvalue, which specifies the value below which eigenvector/eigenvalue pairs will be discarded, and --maxError, which specifies the maximum error (as defined above) to be tolerated in an eigenvector.
If all the eigenvectors can fit in memory, --inMemory allows for a speedier completion of this task by doing so.
Modifier and Type | Field and Description |
---|---|
static String |
CLEAN_EIGENVECTORS
Deprecated.
|
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
Constructor and Description |
---|
EigenVerificationJob()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.fs.Path |
getCleanedEigensPath()
Deprecated.
|
static void |
main(String[] args)
Deprecated.
|
int |
run(org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path tempOut,
double maxError,
double minEigenValue,
boolean inMemory,
org.apache.hadoop.conf.Configuration conf)
Deprecated.
Run the job with the given arguments
|
int |
run(String[] args)
Deprecated.
|
void |
runJob(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path output,
boolean inMemory,
double maxError,
int maxEigens)
Deprecated.
Progammatic invocation of run()
|
void |
setEigensToVerify(VectorIterable eigens)
Deprecated.
|
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
public static final String CLEAN_EIGENVECTORS
public void setEigensToVerify(VectorIterable eigens)
public int run(org.apache.hadoop.fs.Path corpusInput, org.apache.hadoop.fs.Path eigenInput, org.apache.hadoop.fs.Path output, org.apache.hadoop.fs.Path tempOut, double maxError, double minEigenValue, boolean inMemory, org.apache.hadoop.conf.Configuration conf) throws IOException
corpusInput
- the corpus input PatheigenInput
- the eigenvector input Pathoutput
- the output PathtempOut
- temporary output PathmaxError
- a double representing the maximum errorminEigenValue
- a double representing the minimum eigenvalueinMemory
- a boolean requesting in-memory preparationconf
- the Configuration to use, or null if a default is ok (saves referencing Configuration in calling classes
unless needed)IOException
public org.apache.hadoop.fs.Path getCleanedEigensPath()
public void runJob(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path eigenInput, org.apache.hadoop.fs.Path corpusInput, org.apache.hadoop.fs.Path output, boolean inMemory, double maxError, int maxEigens) throws IOException
eigenInput
- Output of LanczosSolvercorpusInput
- Input of LanczosSolverIOException
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.