decompositions

Type Members

type FactorizationResult[K] = Result[K]

Result for distributed ALS-type two-component factorization algorithms
type FactorizationResultInCore = InCoreResult

Result for distributed ALS-type two-component factorization algorithms, in-core matrices

Value Members

object DQR
object DSPCA
object DSSVD
def dals[K](drmA: DrmLike[K], k: Int = 50, lambda: Double = 0.0, maxIterations: Int = 10, convergenceThreshold: Double = 0.10)(implicit arg0: ClassTag[K]): FactorizationResult[K]

Run ALS.
Run ALS. 
Example:
```
val (u,v,errors) = als(input, k).toTuple
```
ALS runs until (rmse[i-1]-rmse[i])/rmse[i-1] < convergenceThreshold, or i==maxIterations, whichever earlier. 
K
row key type of the input (100 is probably more than enough)
drmA
The input matrix
k
required rank of decomposition (number of cols in U and V results)
lambda
regularization rate
maxIterations
maximum iterations to run regardless of convergence
convergenceThreshold
stop sooner if (rmse[i-1] - rmse[i])/rmse[i - 1] is less than this value. If <=0 then we won't compute RMSE and use convergence test.
returns
{ @link org.apache.mahout.math.drm.decompositions.ALS.Result}
def dqrThin[K](drmA: DrmLike[K], checkRankDeficiency: Boolean = true)(implicit arg0: ClassTag[K]): (DrmLike[K], Matrix)

Distributed _thin_ QR.
Distributed _thin_ QR. A'A must fit in a memory, i.e. if A is m x n, then n should be pretty controlled (<5000 or so). 
It is recommended to checkpoint A since it does two passes over it. 
It also guarantees that Q is partitioned exactly the same way (and in same key-order) as A, so their RDD should be able to zip successfully.
def dspca[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

Distributed Stochastic PCA decomposition algorithm.
Distributed Stochastic PCA decomposition algorithm. A logical reflow of the "SSVD-PCA options.pdf" document of the MAHOUT-817.
drmA
input matrix A
k
request SSVD rank
p
oversampling parameter
q
number of power iterations (hint: use either 0 or 1)
returns
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.
def dssvd[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

Distributed Stochastic Singular Value decomposition algorithm.
Distributed Stochastic Singular Value decomposition algorithm.
drmA
input matrix A
k
request SSVD rank
p
oversampling parameter
q
number of power iterations
returns
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.
def spca(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD.
PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD. This follows the solution outlined in MAHOUT-817. For in-core version it, for most part, is supposed to save some memory for sparse inputs by removing direct mean subtraction.
Hint: Usually one wants to use AV which is approsimately USigma, i.e.u %*%: diagv(s). If retaining distances and orignal scaled variances not that important, the normalized PCA space is just U.
Important: data points are considered to be rows.
a
input matrix A
k
request SSVD rank
p
oversampling parameter
q
number of power iterations
returns
(U,V,s)
def ssvd(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

In-core SSVD algorithm.
In-core SSVD algorithm.
a
input matrix A
k
request SSVD rank
p
oversampling parameter
q
number of power iterations
returns
(U,V,s)

package decompositions

Type Members

type FactorizationResult[K] = Result[K]

type FactorizationResultInCore = InCoreResult

Value Members

object DQR

object DSPCA

object DSSVD

def dals[K](drmA: DrmLike[K], k: Int = 50, lambda: Double = 0.0, maxIterations: Int = 10, convergenceThreshold: Double = 0.10)(implicit arg0: ClassTag[K]): FactorizationResult[K]

def dqrThin[K](drmA: DrmLike[K], checkRankDeficiency: Boolean = true)(implicit arg0: ClassTag[K]): (DrmLike[K], Matrix)

def dspca[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

def dssvd[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

def spca(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

def ssvd(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

Inherited from AnyRef

Inherited from Any

Ungrouped