org.apache.mahout.math

decompositions

package decompositions

This package holds all decomposition and factorization-like methods, all that we were able to make distributed engine-independent so far, anyway.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. decompositions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. type FactorizationResult[K] = Result[K]

    Result for distributed ALS-type two-component factorization algorithms

  2. type FactorizationResultInCore = InCoreResult

    Result for distributed ALS-type two-component factorization algorithms, in-core matrices

Value Members

  1. object DQR

  2. object DSPCA

  3. object DSSVD

  4. def dals[K](drmA: DrmLike[K], k: Int = 50, lambda: Double = 0.0, maxIterations: Int = 10, convergenceThreshold: Double = 0.10)(implicit arg0: ClassTag[K]): FactorizationResult[K]

    Run ALS.

    Run ALS. <P>

    Example:

    val (u,v,errors) = als(input, k).toTuple
    

    ALS runs until (rmse[i-1]-rmse[i])/rmse[i-1] < convergenceThreshold, or i==maxIterations, whichever earlier. <P>

    K

    row key type of the input (100 is probably more than enough)

    drmA

    The input matrix

    k

    required rank of decomposition (number of cols in U and V results)

    lambda

    regularization rate

    maxIterations

    maximum iterations to run regardless of convergence

    convergenceThreshold

    stop sooner if (rmse[i-1] - rmse[i])/rmse[i - 1] is less than this value. If <=0 then we won't compute RMSE and use convergence test.

    returns

    { @link org.apache.mahout.math.drm.decompositions.ALS.Result}

  5. def dqrThin[K](drmA: DrmLike[K], checkRankDeficiency: Boolean = true)(implicit arg0: ClassTag[K]): (DrmLike[K], Matrix)

    Distributed _thin_ QR.

    Distributed _thin_ QR. A'A must fit in a memory, i.e. if A is m x n, then n should be pretty controlled (<5000 or so). <P>

    It is recommended to checkpoint A since it does two passes over it. <P>

    It also guarantees that Q is partitioned exactly the same way (and in same key-order) as A, so their RDD should be able to zip successfully.

  6. def dspca[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

    Distributed Stochastic PCA decomposition algorithm.

    Distributed Stochastic PCA decomposition algorithm. A logical reflow of the "SSVD-PCA options.pdf" document of the MAHOUT-817.

    drmA

    input matrix A

    k

    request SSVD rank

    p

    oversampling parameter

    q

    number of power iterations (hint: use either 0 or 1)

    returns

    (U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.

  7. def dssvd[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0)(implicit arg0: ClassTag[K]): (DrmLike[K], DrmLike[Int], Vector)

    Distributed Stochastic Singular Value decomposition algorithm.

    Distributed Stochastic Singular Value decomposition algorithm.

    drmA

    input matrix A

    k

    request SSVD rank

    p

    oversampling parameter

    q

    number of power iterations

    returns

    (U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.

  8. def spca(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

    PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD.

    PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD. This follows the solution outlined in MAHOUT-817. For in-core version it, for most part, is supposed to save some memory for sparse inputs by removing direct mean subtraction.<P>

    Hint: Usually one wants to use AV which is approsimately USigma, i.e.u %*%: diagv(s). If retaining distances and orignal scaled variances not that important, the normalized PCA space is just U.

    Important: data points are considered to be rows.

    a

    input matrix A

    k

    request SSVD rank

    p

    oversampling parameter

    q

    number of power iterations

    returns

    (U,V,s)

  9. def ssvd(a: Matrix, k: Int, p: Int = 15, q: Int = 0): (Matrix, Matrix, Vector)

    In-core SSVD algorithm.

    In-core SSVD algorithm.

    a

    input matrix A

    k

    request SSVD rank

    p

    oversampling parameter

    q

    number of power iterations

    returns

    (U,V,s)

Inherited from AnyRef

Inherited from Any

Ungrouped