drm

Type Members

trait BCast[T] extends Closeable

Broadcast variable abstraction
type BlockMapFunc[S, R] = ((Array[S], Matrix)) ⇒ (Array[R], Matrix)

Block-map func
type BlockMapFunc2[S] = ((Array[S], Matrix)) ⇒ Matrix
type BlockReduceFunc = (Matrix, Matrix) ⇒ Matrix
type BlockifiedDrmTuple[K] = Tuple2[Array[K], _ <: Matrix]

Drm block-wise tuple: Array of row keys and the matrix block.
trait CheckpointedDrm[K] extends DrmLike[K]

Checkpointed DRM API.
class CheckpointedOps[K] extends AnyRef

Additional experimental operations over CheckpointedDRM implementation.
trait DistributedContext extends Closeable

Distributed context (a.
trait DistributedEngine extends AnyRef

Abstraction of optimizer/distributed engine
final class DrmDoubleScalarOps extends AnyVal
trait DrmLike[K] extends AnyRef

Basic DRM trait.
class DrmLikeOps[K] extends AnyRef

Common Drm ops
type DrmTuple[K] = (K, Vector)

Drm row-wise tuple
class RLikeDrmIntOps extends RLikeDrmOps[Int]
class RLikeDrmOps[K] extends DrmLikeOps[K]

Value Members

object CacheHint extends Enumeration
object DistributedEngine
object RLikeDrmOps
implicit def bcast2val[T](bcast: BCast[T]): T

Implicit broadcast -> value conversion.
implicit def ctx2engine(ctx: DistributedContext): DistributedEngine

Just throw all engine operations into context as well.
def dabs[K](drmA: DrmLike[K]): DrmLike[K]
def dcolMeanCov[K](drmA: DrmLike[K])(implicit arg0: ClassTag[K]): (Vector, DrmLike[Int])

Compute COV(X) matrix and mean of row-wise data set.
Compute COV(X) matrix and mean of row-wise data set. X is presented as row-wise input matrix A.
This is a "wide" procedure, covariance matrix is returned as a DRM.
drmA
note: will pin input into cache if not yet pinned.
returns
mean → covariance DRM
def dcolMeanCovThin[K](drmA: DrmLike[K])(implicit arg0: ClassTag[K]): (Vector, Matrix)

Thin column-wise mean and covariance matrix computation.
Thin column-wise mean and covariance matrix computation. Same as dcolMeanCov() but suited for thin and tall inputs where covariance matrix can be reduced and finalized in driver memory.
drmA
note: will pin input to cache if not yet pinned.
returns
mean → covariance matrix (in core)
def dcolMeanStdevs[K](drmA: DrmLike[K]): (Vector, Vector)

Compute column wise means and standard deviations -- distributed version.
Compute column wise means and standard deviations -- distributed version.
drmA
note: input will be pinned to cache if not yet pinned
returns
colMeans → colStdevs
def dcolMeanVars[K](drmA: DrmLike[K]): (Vector, Vector)

Compute column wise means and variances -- distributed version.
Compute column wise means and variances -- distributed version.
K
drmA
Note: will pin input to cache if not yet pinned.
returns
colMeans → colVariances
def dexp[K](drmA: DrmLike[K]): DrmLike[K]
def dlog[K](drmA: DrmLike[K]): DrmLike[K]
implicit def drm2Checkpointed[K](drm: DrmLike[K]): CheckpointedDrm[K]

We assume that whenever computational action is invoked without explicit checkpoint, the user doesn't imply caching
implicit def drm2InCore[K](drm: DrmLike[K]): Matrix

Implicit conversion to in-core with NONE caching of the result.
def drm2IntKeyed[K](drmX: DrmLike[K], computeMap: Boolean = false): (DrmLike[Int], Option[DrmLike[K]])

Convert arbitrarily-keyed matrix to int-keyed matrix.
Convert arbitrarily-keyed matrix to int-keyed matrix. Some algebra will accept only int-numbered row matrices. So this method is to help.
K
key type
drmX
input to be transcoded
computeMap
collect old key -> int key map to front-end?
returns
Sequentially keyed matrix + (optionally) map from non-int key to Int key. If the key type is actually Int, then we just return the argument with None for the map, regardless of computeMap parameter.
implicit def drm2drmCpOps[K](drm: CheckpointedDrm[K]): CheckpointedOps[K]
def drmBroadcast(v: Vector)(implicit ctx: DistributedContext): BCast[Vector]

Broadcast support API
def drmBroadcast(m: Matrix)(implicit ctx: DistributedContext): BCast[Matrix]

Broadcast support API
def drmDfsRead(path: String)(implicit ctx: DistributedContext): CheckpointedDrm[_]

Load DRM from hdfs (as in Mahout DRM format)
def drmParallelize(m: Matrix, numPartitions: Int = 1)(implicit sc: DistributedContext): CheckpointedDrm[Int]

Shortcut to parallelizing matrices with indices, ignore row labels.
def drmParallelizeEmpty(nrow: Int, ncol: Int, numPartitions: Int = 10)(implicit ctx: DistributedContext): CheckpointedDrm[Int]

This creates an empty DRM with specified number of partitions and cardinality.
def drmParallelizeEmptyLong(nrow: Long, ncol: Int, numPartitions: Int = 10)(implicit ctx: DistributedContext): CheckpointedDrm[Long]

Creates empty DRM with non-trivial height
def drmParallelizeWithRowIndices(m: Matrix, numPartitions: Int = 1)(implicit ctx: DistributedContext): CheckpointedDrm[Int]

Parallelize in-core matrix as a distributed matrix, using row ordinal indices as data set keys.
def drmParallelizeWithRowLabels(m: Matrix, numPartitions: Int = 1)(implicit ctx: DistributedContext): CheckpointedDrm[String]

Parallelize in-core matrix as a distributed matrix, using row labels as a data set keys.
def drmSampleKRows[K](drmX: DrmLike[K], numSamples: Int, replacement: Boolean = false): Matrix
def drmSampleRows[K](drmX: DrmLike[K], fraction: Double, replacement: Boolean = false): DrmLike[K]

(Optional) Sampling operation.
(Optional) Sampling operation. Consistent with Spark semantics of the same.
K
drmX
fraction
replacement
returns
samples
def drmSampleToTSV[K](drmX: DrmLike[K], samplePercent: Double = 1): String

Convert a DRM sample into a Tab Separated Vector (TSV) to be loaded into an R-DataFrame for plotting and sketching
Convert a DRM sample into a Tab Separated Vector (TSV) to be loaded into an R-DataFrame for plotting and sketching
K
drmX
- DRM
samplePercent
- Percentage of Sample elements from the DRM to be fished out for plotting
returns
TSV String
def dsignum[K](drmA: DrmLike[K]): DrmLike[K]
def dsqDist(drmX: DrmLike[Int], drmY: DrmLike[Int]): DrmLike[Int]

Compute fold-in distances (distributed version).
Compute fold-in distances (distributed version). Here, we use pretty much the same math as with squared distances.
D_sq = s*1' + 1*t' - 2*X*Y'
where s is row sums of hadamard product(X, X), and, similarly, s is row sums of Hadamard product(Y, Y).
drmX
m x d row-wise dataset. Pinned to cache if not yet pinned.
drmY
n x d row-wise dataset. Pinned to cache if not yet pinned.
returns
m x d pairwise squared distance matrix (between rows of X and Y)
def dsqDist(drmX: DrmLike[Int]): DrmLike[Int]

Distributed Squared distance matrix computation.
def dsqrt[K](drmA: DrmLike[K]): DrmLike[K]
package logical
def safeToNonNegInt(x: Long): Int

CacheHint type

package drm

Type Members

trait BCast[T] extends Closeable

type BlockMapFunc[S, R] = ((Array[S], Matrix)) ⇒ (Array[R], Matrix)

type BlockMapFunc2[S] = ((Array[S], Matrix)) ⇒ Matrix

type BlockReduceFunc = (Matrix, Matrix) ⇒ Matrix

type BlockifiedDrmTuple[K] = Tuple2[Array[K], _ <: Matrix]

trait CheckpointedDrm[K] extends DrmLike[K]

class CheckpointedOps[K] extends AnyRef

trait DistributedContext extends Closeable

trait DistributedEngine extends AnyRef

final class DrmDoubleScalarOps extends AnyVal

trait DrmLike[K] extends AnyRef

class DrmLikeOps[K] extends AnyRef

type DrmTuple[K] = (K, Vector)

class RLikeDrmIntOps extends RLikeDrmOps[Int]

class RLikeDrmOps[K] extends DrmLikeOps[K]

Value Members

object CacheHint extends Enumeration

object DistributedEngine

object RLikeDrmOps

implicit def bcast2val[T](bcast: BCast[T]): T

implicit def ctx2engine(ctx: DistributedContext): DistributedEngine

def dabs[K](drmA: DrmLike[K]): DrmLike[K]

def dcolMeanCov[K](drmA: DrmLike[K])(implicit arg0: ClassTag[K]): (Vector, DrmLike[Int])

def dcolMeanCovThin[K](drmA: DrmLike[K])(implicit arg0: ClassTag[K]): (Vector, Matrix)

def dcolMeanStdevs[K](drmA: DrmLike[K]): (Vector, Vector)

def dcolMeanVars[K](drmA: DrmLike[K]): (Vector, Vector)

def dexp[K](drmA: DrmLike[K]): DrmLike[K]

def dlog[K](drmA: DrmLike[K]): DrmLike[K]

implicit def drm2Checkpointed[K](drm: DrmLike[K]): CheckpointedDrm[K]

implicit def drm2InCore[K](drm: DrmLike[K]): Matrix

def drm2IntKeyed[K](drmX: DrmLike[K], computeMap: Boolean = false): (DrmLike[Int], Option[DrmLike[K]])

implicit def drm2drmCpOps[K](drm: CheckpointedDrm[K]): CheckpointedOps[K]

def drmBroadcast(v: Vector)(implicit ctx: DistributedContext): BCast[Vector]

def drmBroadcast(m: Matrix)(implicit ctx: DistributedContext): BCast[Matrix]

def drmDfsRead(path: String)(implicit ctx: DistributedContext): CheckpointedDrm[_]

def drmParallelize(m: Matrix, numPartitions: Int = 1)(implicit sc: DistributedContext): CheckpointedDrm[Int]

def drmParallelizeEmpty(nrow: Int, ncol: Int, numPartitions: Int = 10)(implicit ctx: DistributedContext): CheckpointedDrm[Int]

def drmParallelizeEmptyLong(nrow: Long, ncol: Int, numPartitions: Int = 10)(implicit ctx: DistributedContext): CheckpointedDrm[Long]

def drmParallelizeWithRowIndices(m: Matrix, numPartitions: Int = 1)(implicit ctx: DistributedContext): CheckpointedDrm[Int]

def drmParallelizeWithRowLabels(m: Matrix, numPartitions: Int = 1)(implicit ctx: DistributedContext): CheckpointedDrm[String]

def drmSampleKRows[K](drmX: DrmLike[K], numSamples: Int, replacement: Boolean = false): Matrix

def drmSampleRows[K](drmX: DrmLike[K], fraction: Double, replacement: Boolean = false): DrmLike[K]

def drmSampleToTSV[K](drmX: DrmLike[K], samplePercent: Double = 1): String

def dsignum[K](drmA: DrmLike[K]): DrmLike[K]

def dsqDist(drmX: DrmLike[Int], drmY: DrmLike[Int]): DrmLike[Int]

def dsqDist(drmX: DrmLike[Int]): DrmLike[Int]

def dsqrt[K](drmA: DrmLike[K]): DrmLike[K]

package logical

def safeToNonNegInt(x: Long): Int

Inherited from AnyRef

Inherited from Any

Ungrouped