The Ridge regressor in Mahout implements a closed-form solution to Ridge Regression (https://en.wikipedia.org/wiki/Tikhonov_regularization). Based on the linear regressor, Ridge regression adds, from a bayesian perspective, a prior normal distribution to the beta coefficients matrix, centered with a standard deviation given by the lambda parameter. The higher the lambda value, the more spread the beta values should be. From a linear algebra perspective, the addition of a diagonal matrix determined by the lambda hyperparameter, breaks matrix collinearity, thus making the design matrix product invertible. Finally, from an optimization perspective, a higher value of lambda penalizes higher values of beta coefficients since it adds a square magnitude on it, also known as L2 regularization.

Ridge regression may be used to treat collinearity issues (although stochastic and numerical approximation methods may solve this in linear regression) and also to achieve a more generalized (better fitting) model as the lambda parameters accounts for an addition in bias which in turn can decrease the overall quadratic error (by reducing variance).

It has been seen that very high values in beta coefficients, as result of linear regression, often as result of high collinearity, can be avoided by using a Ridge regression, penalizing higher values of beta coefficients.


Parameter Description Default Value
'lambda Regularization parameter for Ridge Regression (the larger, the more generalized the model is) true
'addIntercept Add an intercept to \(\mathbf{X}\) true


import org.apache.mahout.math.algorithms.regression.RidgeRegressionModel
val drmData = drmParallelize(dense(
  (2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
  (1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
  (1, 1, 12,   13, 22.736446),  // Cocoa Puffs
  (2, 1, 11,   13, 32.207582),  // Froot Loops
  (1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
  (2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
  (6, 2, 17,   1,  50.764999),  // Cheerios
  (3, 2, 13,   7,  40.400208),  // Clusters
  (3, 3, 13,   4,  45.811716)), numPartitions = 2)

val drmX = drmData(::, 0 until 4)
val drmY = drmData(::, 4 until 5)

val model = new RidgeRegression().fit(drmX, drmY, 'lambda -> 1.0)
val myAnswer = model.predict(drmX).collect