To get started, add the following dependency to the pom:
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-flink_2.10</artifactId>
<version>0.12.0</version>
</dependency>
Here is how to use the Flink backend:
import org.apache.flink.api.scala._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.flinkbindings._
object ReadCsvExample {
def main(args: Array[String]): Unit = {
val filePath = "path/to/the/input/file"
val env = ExecutionEnvironment.getExecutionEnvironment
implicit val ctx = new FlinkDistributedContext(env)
val drm = readCsv(filePath, delim = "\t", comment = "#")
val C = drm.t %*% drm
println(C.collect)
}
}
The top JIRA for Flink backend is MAHOUT-1570 which has been fully implemented.
A + 2
or A + B
)cbind
and rbind
A(1 to 10, ::)
)A %*% B
when B
is in-core)At
, Ax
, Atx
- Ax
and At
are implementedInt
, Long
and String
Atx
ABt
AtA
There is a set of standard tests that all engines should pass (see MAHOUT-1764).
DistributedDecompositionsSuite
DrmLikeOpsSuite
DrmLikeSuite
RLikeDrmOpsSuite
These are Flink-backend specific tests, e.g.
DrmLikeOpsSuite
for operations like norm
, rowSums
, rowMeans
RLikeOpsSuite
for basic LA like A.t %*% A
, A.t %*% x
, etcLATestSuite
tests for specific operators like AtB
, Ax
, etcUseCasesSuite
has more complex examples, like power iteration, ridge regression, etcFor development the minimal supported configuration is
When using mahout, please import the following modules:
mahout-math
mahout-math-scala
mahout-flink_2.10
*