Run ALS.
Run ALS. <P>
Example:
val (u,v,errors) = als(input, k).toTuple
ALS runs until (rmse[i-1]-rmse[i])/rmse[i-1] < convergenceThreshold, or i==maxIterations, whichever earlier. <P>
row key type of the input (100 is probably more than enough)
The input matrix
required rank of decomposition (number of cols in U and V results)
regularization rate
maximum iterations to run regardless of convergence
stop sooner if (rmse[i-1] - rmse[i])/rmse[i - 1] is less than this value. If <=0 then we won't compute RMSE and use convergence test.
{ @link org.apache.mahout.math.drm.decompositions.ALS.Result}
Distributed _thin_ QR.
Distributed _thin_ QR. A'A must fit in a memory, i.e. if A is m x n, then n should be pretty controlled (<5000 or so). <P>
It is recommended to checkpoint A since it does two passes over it. <P>
It also guarantees that Q is partitioned exactly the same way (and in same key-order) as A, so their RDD should be able to zip successfully.
Distributed Stochastic PCA decomposition algorithm.
Distributed Stochastic PCA decomposition algorithm. A logical reflow of the "SSVD-PCA options.pdf" document of the MAHOUT-817.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations (hint: use either 0 or 1)
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.
Distributed Stochastic Singular Value decomposition algorithm.
Distributed Stochastic Singular Value decomposition algorithm.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s). Note that U, V are non-checkpointed matrices (i.e. one needs to actually use them e.g. save them to hdfs in order to trigger their computation.
PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD.
PCA based on SSVD that runs without forming an always-dense A-(colMeans(A)) input for SVD. This follows the solution outlined in MAHOUT-817. For in-core version it, for most part, is supposed to save some memory for sparse inputs by removing direct mean subtraction.<P>
Hint: Usually one wants to use AV which is approsimately USigma, i.e.u %*%: diagv(s)
.
If retaining distances and orignal scaled variances not that important, the normalized PCA space
is just U.
Important: data points are considered to be rows.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s)
In-core SSVD algorithm.
In-core SSVD algorithm.
input matrix A
request SSVD rank
oversampling parameter
number of power iterations
(U,V,s)
This package holds all decomposition and factorization-like methods, all that we were able to make distributed engine-independent so far, anyway.