Getting Started

To get started, add the following dependency to the pom:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-flink_2.10</artifactId>
  <version>0.12.0</version>
</dependency>

Here is how to use the Flink backend:

import org.apache.flink.api.scala._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.flinkbindings._

object ReadCsvExample {

  def main(args: Array[String]): Unit = {
    val filePath = "path/to/the/input/file"

    val env = ExecutionEnvironment.getExecutionEnvironment
    implicit val ctx = new FlinkDistributedContext(env)

    val drm = readCsv(filePath, delim = "\t", comment = "#")
    val C = drm.t %*% drm
    println(C.collect)
  }

}

Current Status

The top JIRA for Flink backend is MAHOUT-1570 which has been fully implemented.

Implemented

Tests

There is a set of standard tests that all engines should pass (see MAHOUT-1764).

These are Flink-backend specific tests, e.g.

Environment

For development the minimal supported configuration is

When using mahout, please import the following modules: