Entry point, not using Scala App trait
Entry point, not using Scala App trait
Command line args, if empty a help message is printed.
Creates a Spark context to run the job inside.
Creates a Spark context to run the job inside. Override to set the SparkConf values specific to the job, these must be set before the context is created.
Call this before start to use an existing context as when running multiple drivers from a scalatest suite.
Call this before start to use an existing context as when running multiple drivers from a scalatest suite.
An already set up context to run against
Command line interface for ). Reads a text delimited file containing rows of a org.apache.mahout.math.indexeddataset.IndexedDataset with domain specific IDS of the form (row id, column id: strength, ...). The IDs will be preserved in the output. The rows define a matrix and ) will be used to calculate row-wise similarity using log-likelihood. The options allow control of the input schema, file discovery, output schema, and control of algorithm parameters.
To get help run
for a full explanation of options. The default values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending. mahout spark-rowsimilarity }}} values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) Each output line will contain a row ID and similar columns sorted by LLR strength descending.
To use with a Spark cluster see the --master option, if you run out of heap space check the --sparkExecutorMemory option.