sparkbindings

Type Members

type BlockifiedDrmRdd[K] = RDD[(Array[K], Matrix)]

Blockifed DRM rdd (keys of original DRM are grouped into array corresponding to rows of Matrix object value
type DrmRdd[K] = RDD[(K, Vector)]

Row-wise organized DRM rdd type
class SparkDistributedContext extends DistributedContext

Value Members

object SparkEngine extends DistributedEngine

Spark-specific non-drm-method operations
package blas

This validation contains distributed algorithms that distributed matrix expression optimizer picks from.
implicit def cpDrm2cpDrmSparkOps[K](drm: CheckpointedDrm[K]): CheckpointedDrmSparkOps[K]

Adding Spark-specific ops
implicit def dc2sc(dc: DistributedContext): SparkContext
package drm
implicit def drm2cpDrmSparkOps[K](drm: DrmLike[K]): CheckpointedDrmSparkOps[K]
def drmWrap[K](rdd: DrmRdd[K], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false)(implicit arg0: ClassTag[K]): CheckpointedDrm[K]

Wrap existing RDD into a matrix
K
row key type
rdd
source rdd conforming to org.apache.mahout.sparkbindings.DrmRdd
nrow
optional, number of rows. If not specified, we'll try to figure out on our own.
ncol
optional, number of columns. If not specififed, we'll try to figure out on our own.
cacheHint
optional, desired cache policy for that rdd.
canHaveMissingRows
optional. For int-keyed rows, there might be implied but missing rows. If underlying rdd may have that condition, we need to know since some operators consider that a deficiency and we'll need to fix it lazily before proceeding with such operators. It only meaningful if nrow is also specified (otherwise, we'll run quick test to figure if rows may be missing, at the time we count the rows).
returns
wrapped DRM
def drmWrapBlockified[K](blockifiedDrmRdd: BlockifiedDrmRdd[K], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false)(implicit arg0: ClassTag[K]): CheckpointedDrm[K]

Another drmWrap version that takes in vertical block-partitioned input to form the matrix.
def drmWrapDataFrame(df: DataFrame, nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

A drmWrap version that takes a DataFrame of Row[Double]
def drmWrapMLLibLabeledPoint(rdd: RDD[LabeledPoint], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

A drmWrap version that takes an RDD[org.
A drmWrap version that takes an RDD[org.apache.spark.mllib.regression.LabeledPoint] returns a DRM where column the label is the last column
def drmWrapMLLibVector(rdd: RDD[Vector], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

A drmWrap Version that takes an RDD[org.
A drmWrap Version that takes an RDD[org.apache.spark.mllib.linalg.Vector]
package indexeddataset
package io
def mahoutSparkContext(masterUrl: String, appName: String, customJars: TraversableOnce[String] = Nil, sparkConf: SparkConf = new SparkConf(), addMahoutJars: Boolean = true): SparkDistributedContext

Create proper spark context that includes local Mahout jars
Create proper spark context that includes local Mahout jars
masterUrl
appName
customJars
returns
implicit def sb2bc[T](b: Broadcast[T]): BCast[T]

Broadcast transforms
implicit def sc2sdc(sc: SparkContext): SparkDistributedContext
implicit def sdc2sc(sdc: SparkDistributedContext): SparkContext

package sparkbindings

Type Members

type BlockifiedDrmRdd[K] = RDD[(Array[K], Matrix)]

type DrmRdd[K] = RDD[(K, Vector)]

class SparkDistributedContext extends DistributedContext

Value Members

object SparkEngine extends DistributedEngine

package blas

implicit def cpDrm2cpDrmSparkOps[K](drm: CheckpointedDrm[K]): CheckpointedDrmSparkOps[K]

implicit def dc2sc(dc: DistributedContext): SparkContext

package drm

implicit def drm2cpDrmSparkOps[K](drm: DrmLike[K]): CheckpointedDrmSparkOps[K]

def drmWrap[K](rdd: DrmRdd[K], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false)(implicit arg0: ClassTag[K]): CheckpointedDrm[K]

Wrap existing RDD into a matrix

def drmWrapBlockified[K](blockifiedDrmRdd: BlockifiedDrmRdd[K], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false)(implicit arg0: ClassTag[K]): CheckpointedDrm[K]

def drmWrapDataFrame(df: DataFrame, nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

def drmWrapMLLibLabeledPoint(rdd: RDD[LabeledPoint], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

def drmWrapMLLibVector(rdd: RDD[Vector], nrow: Long = 1, ncol: Int = 1, cacheHint: CacheHint = CacheHint.NONE, canHaveMissingRows: Boolean = false): CheckpointedDrm[Int]

package indexeddataset

package io

def mahoutSparkContext(masterUrl: String, appName: String, customJars: TraversableOnce[String] = Nil, sparkConf: SparkConf = new SparkConf(), addMahoutJars: Boolean = true): SparkDistributedContext

implicit def sb2bc[T](b: Broadcast[T]): BCast[T]

implicit def sc2sdc(sc: SparkContext): SparkDistributedContext

implicit def sdc2sc(sdc: SparkDistributedContext): SparkContext

Inherited from AnyRef

Inherited from Any

Ungrouped