To get started, add the following dependency to the pom:
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-flink_2.10</artifactId>
<version>0.12.0</version>
</dependency>
Here is how to use the Flink backend:
import org.apache.flink.api.scala._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.flinkbindings._
object ReadCsvExample {
def main(args: Array[String]): Unit = {
val filePath = "path/to/the/input/file"
val env = ExecutionEnvironment.getExecutionEnvironment
implicit val ctx = new FlinkDistributedContext(env)
val drm = readCsv(filePath, delim = "\t", comment = "#")
val C = drm.t %*% drm
println(C.collect)
}
}
The top JIRA for Flink backend is MAHOUT-1570 which has been fully implemented.
A + 2 or A + B)cbind and rbindA(1 to 10, ::))A %*% B when B is in-core)At, Ax, Atx - Ax and At are implementedInt, Long and StringAtxABtAtAThere is a set of standard tests that all engines should pass (see MAHOUT-1764).
DistributedDecompositionsSuiteDrmLikeOpsSuiteDrmLikeSuiteRLikeDrmOpsSuiteThese are Flink-backend specific tests, e.g.
DrmLikeOpsSuite for operations like norm, rowSums, rowMeansRLikeOpsSuite for basic LA like A.t %*% A, A.t %*% x, etcLATestSuite tests for specific operators like AtB, Ax, etcUseCasesSuite has more complex examples, like power iteration, ridge regression, etcFor development the minimal supported configuration is
When using mahout, please import the following modules:
mahout-mathmahout-math-scalamahout-flink_2.10
*