StandardScaler centers the values of each column to their mean, and scales them to unit variance.
Relation to the
scale function in R-base
StandardScaler is the equivelent of the R-base function
one noteable tweek. R’s
scale function (indeed all of R) calculates standard deviation with 1 degree of freedom, Mahout
(like many other statistical packages aimed at larger data sets) does not make this adjustment. In larger datasets the difference
is trivial, however when testing the function on smaller datasets the practicioner may be confused by the discrepency.
To verify this function against R on an arbitrary matrix, use the following form in R to “undo” the degrees of freedom correction.
N <- nrow(x)
scale(x, scale= apply(x, 2, sd) * sqrt(N-1/N))
StandardScaler takes no parameters at this time.
val A = drmParallelize(dense(
(1, 1, 5),
(2, 5, -15),
(3, 9, -2)), numPartitions = 2)
val scaler: StandardScalerModel = new StandardScaler().fit(A)
val scaledA = scaler.transform(A)