Entry point, not using Scala App trait
Entry point, not using Scala App trait
Command line args, if empty a help message is printed.
Creates a Spark context to run the job inside.
Creates a Spark context to run the job inside. Override to set the SparkConf values specific to the job, these must be set before the context is created.
Call this before start to use an existing context as when running multiple drivers from a scalatest suite.
Call this before start to use an existing context as when running multiple drivers from a scalatest suite.
An already set up context to run against
Command line interface for org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs. Reads text lines that contain (row id, column id, ...). The IDs are user specified strings which will be preserved in the output. The individual elements will be accumulated into a matrix like org.apache.mahout.math.indexeddataset.IndexedDataset and org.apache.mahout.math.cf.SimilarityAnalysis#cooccurrencesIDSs will be used to calculate row-wise self-similarity, or when using filters or two inputs, will generate two matrices and calculate both the self-similarity of the primary matrix and the row-wise similarity of the primary to the secondary. Returns one or two directories of text files formatted as specified in the options. The options allow flexible control of the input schema, file discovery, output schema, and control of algorithm parameters. To get help run
for a full explanation of options. To process simple elements of text delimited values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending. mahout spark-itemsimilarity }}} values (userID,itemID) with or without a strengths and with a separator of tab, comma, or space, you can specify only the input and output file and directory--all else will default to the correct values. Each output line will contain the Item ID and similar items sorted by LLR strength descending.
To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option. Other org.apache.spark.SparkConf key value pairs can be with the -D:k=v option.