The `OrinaryLeastSquares`

regressor in Mahout implements a *closed-form* solution to Ordinary Least Squares.
This is in stark contrast to many “big data machine learning” frameworks which implement a *stochastic* approach. From the users perspecive this difference can be reduced to:

- A series of guesses at a line line of best fit.*Stochastic*- A mathimatical approach has been explored, the properties of the parameters are well understood, and problems which arise (and the remedial measures), exist. This is usually the preferred choice of mathematicians/statisticians, but computational limititaions have forced us to resort to SGD.*Closed Form*

Parameter | Description | Default Value |
---|---|---|

`'calcCommonStatistics` |
Calculate commons statistics such as Coeefficient of Determination and Mean Square Error | `true` |

`'calcStandardErrors` |
Calculate the standard errors (and subsequent "t-scores" and "p-values") of the \(\boldsymbol{\beta}\) estimates | `true` |

`'addIntercept` |
Add an intercept to \(\mathbf{X}\) | `true` |

In this example we disable the “calculate common statistics” parameters, so our summary will NOT contain the coefficient of determination (R-squared) or Mean Square Error

```
import org.apache.mahout.math.algorithms.regression.OrdinaryLeastSquares
val drmData = drmParallelize(dense(
(2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios
(1, 2, 12, 12, 18.042851), // Cap'n'Crunch
(1, 1, 12, 13, 22.736446), // Cocoa Puffs
(2, 1, 11, 13, 32.207582), // Froot Loops
(1, 2, 12, 11, 21.871292), // Honey Graham Ohs
(2, 1, 16, 8, 36.187559), // Wheaties Honey Gold
(6, 2, 17, 1, 50.764999), // Cheerios
(3, 2, 13, 7, 40.400208), // Clusters
(3, 3, 13, 4, 45.811716)), numPartitions = 2)
val drmX = drmData(::, 0 until 4)
val drmY = drmData(::, 4 until 5)
val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcCommonStatistics → false)
println(model.summary)
```