The OrinaryLeastSquares regressor in Mahout implements a closed-form solution to Ordinary Least Squares. This is in stark contrast to many “big data machine learning” frameworks which implement a stochastic approach. From the users perspecive this difference can be reduced to:

• Stochastic- A series of guesses at a line line of best fit.
• Closed Form- A mathimatical approach has been explored, the properties of the parameters are well understood, and problems which arise (and the remedial measures), exist. This is usually the preferred choice of mathematicians/statisticians, but computational limititaions have forced us to resort to SGD.

### Parameters

Parameter Description Default Value
'calcCommonStatistics Calculate commons statistics such as Coeefficient of Determination and Mean Square Error true
'calcStandardErrors Calculate the standard errors (and subsequent "t-scores" and "p-values") of the $$\boldsymbol{\beta}$$ estimates true
'addIntercept Add an intercept to $$\mathbf{X}$$ true

### Example

In this example we disable the “calculate common statistics” parameters, so our summary will NOT contain the coefficient of determination (R-squared) or Mean Square Error

import org.apache.mahout.math.algorithms.regression.OrdinaryLeastSquares

val drmData = drmParallelize(dense(
(2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
(1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
(1, 1, 12,   13, 22.736446),  // Cocoa Puffs
(2, 1, 11,   13, 32.207582),  // Froot Loops
(1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
(2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
(6, 2, 17,   1,  50.764999),  // Cheerios
(3, 2, 13,   7,  40.400208),  // Clusters
(3, 3, 13,   4,  45.811716)), numPartitions = 2)

val drmX = drmData(::, 0 until 4)
val drmY = drmData(::, 4 until 5)

val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcCommonStatistics → false)
println(model.summary)