Release Notes

07 August 2015 - Apache Mahout 0.11.0 released

Mahout 0.11.0 includes all new features and bugfixes released in Mahout versions 0.10.1 and 0.10.2 along with support for Spark 1.3+.

Highlights include:

  • Spark 1.3 support
  • Fixes for a major memory usage bug in co-occurrence analysis used by the driver spark-itemsimilarity. This will now require far less memory in the executor.
  • Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops.
  • All of the Mahout Samsara fixes from 0.10.2 Release

06 August 2015 - Apache Mahout 0.10.2 released

Highlights include:

  • In-core transpose view rewrites. Modifiable transpose views eg. (for (col <- a.t) col := 5).
  • Performance and parallelization improvements for AB', A'B, A'A spark physical operators.
  • Optional structural "flavor" abstraction for in-core matrices. In-core matrices can now be tagged as e.g. sparse or dense.
  • %*% optimization based on matrix flavors.
  • In-core ::= sparse assignment functions.
  • Assign := optimization (do proper traversal based on matrix flavors, similarly to %*%).
  • Adding in-place elementwise functional assignment (e.g. mxA := exp , mxA ::= exp ).
  • Distributed and in-core version of simple elementwise analogues of scala.math._. for example, for log(x) the convention is dlog(drm), mlog(mx), vlog(vec). Unfortunately we cannot overload these functions over what is done in scala.math, i.e. scala would not allow log(mx) or log(drm) and log(Double) at the same time, mainly because they are being defined in different packages.
  • Distributed and in-core first and second moment routines. R analogs: mean(), colMeans(), rowMeans(), variance(), sd(). By convention, distributed versions are prepended by (d) letter: colMeanVars() colMeanStdevs() dcolMeanVars() dcolMeanStdevs().
  • Distance and squared distance matrix routines. R analog: dist(). Provide both squared and non-squared Euclidean distance matrices. By convention, distributed versions are prepended by (d) letter: dist(x), sqDist(x), dsqDist(x). Also a variation for pair-wise distance matrix of two different inputs x and y: sqDist(x,y), dsqDist(x,y).
  • DRM row sampling api.
  • Distributed performance bug fixes. This relates mostly to (a) matrix multiplication deficiencies, and (b) handling parallelism.
  • Distributed engine neutral allreduceBlock() operator api for Spark and H2O.
  • Distributed optimizer operators for elementwise functions. Rewrites recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise physical operator: elementwiseFunc(f1(f2(drmX)) where f1 = 1 + x and f2 = exp(x).
  • More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other way around) for Spark and H2O.
  • Added +=: and *=: operators on vectors.
  • Closeable API for broadcast tensors.
  • Support for conversion of any type-keyed DRM into ordinally-keyed DRM.
  • Scala logging style.
  • rowSumsMap() summary for non-int-keyed DRMs.
  • elementwise power operator ^ .
  • R-like vector concatenation operator.
  • In-core functional assignments e.g.: mxA := { (x) => x * x}.
  • Straighten out behavior of Matrix.iterator() and iterateNonEmpty().
  • New mutable transposition view for in-core matrices. In-core matrix transpose view. rewrite with mostly two goals in mind: (1) enable mutability, e.g. for (col <- mxA.t) col := k (2) translate matrix structural flavor for optimizers correctly. i.e. new SparseRowMatrix.t carries on as column-major structure.
  • Native support for kryo serialization of tensor types.
  • Deprecation of the MultiLayerPerceptron, ConcatenateVectorsJob and all related classes.
  • Deprecation of SparseColumnMatrix.

31 May 2015 - Apache Mahout 0.10.1 released

Highlights include:

  • Major memory use improvements in cooccurrence analysis including the spark-itemsimilarity driver MAHOUT-1707
  • Support for Spark version 1.2.2 or less.
  • Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops.
  • Trim down packages size to < 200MB MAHOUT-1704 and MAHOUT-1706
  • Minor testing indicates binary compatibility with Spark 1.3 with the exception of the Mahout Shell.

11 April 2015 - Apache Mahout 0.10.0 released

Mahout 0.10.0 was a major release, which separates out a ML environment (we call Mahout-Samsara) including an extended version of Scala that is largely backend independent but runs fully on Spark. The Hadoop MapReduce versions of Mahout algorithms are still maintained but no new MapReduce contributions are accepted. From this release onwards contributions must be Mahout Samsara based or at least run on Spark.

Highlights include:

New Mahout Samsara Environment

  • Distributed Algebraic optimizer
  • R-Like DSL Scala API
  • Linear algebra operations
  • Ops are extensions to Scala
  • Scala REPL based interactive shell running on Spark
  • Integrates with compatible libraries like MLlib
  • Run on distributed Spark
  • H2O in progress

New Mahout Samsara based Algorithms

  • Stochastic Singular Value Decomposition (ssvd, dssvd)
  • Stochastic Principal Component Analysis (spca, dspca)
  • Distributed Cholesky QR (thinQR)
  • Distributed regularized Alternating Least Squares (dals)
  • Collaborative Filtering: Item and Row Similarity
  • Naive Bayes Classification
  • Distributed and in-core

Changes in 0.10.0 are detailed here

1 February 2014 - Apache Mahout 0.9 released

Highlights include:

Changes in 0.9 are detailed here.

25 July 2013 - Apache Mahout 0.8 released

Highlights include:

  • Numerous performance improvements to Vector and Matrix implementations, API's and their iterators
  • Numerous performance improvements to the recommender implementations
  • MAHOUT-1088: Support for biased item-based recommender
  • MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases
  • MAHOUT-1106: Support for SVD++
  • MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1.
  • MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering
  • MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job.
  • MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values).
  • MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices.
  • MAHOUT-1187: Upgraded to CommonsLang3
  • MAHOUT-916: Speedup the Mahout build by making tests run in parallel.

Changes in 0.8 are detailed here.

16 June 2012 - Apache Mahout 0.7 released

Highlights include:

  • Outlier removal capability in K-Means, Fuzzy K, Canopy and Dirichlet Clustering
  • New Clustering implementation for K-Means, Fuzzy K, Canopy and Dirichlet using Cluster Classifiers
  • Collections and Math API consolidated
  • (Complementary) Naive Bayes refactored and cleaned
  • Watchmaker and Old Naive Bayes dropped.
  • Many bug fixes, refactorings, and other small improvements

Changes in 0.7 are detailed here.

6 Feb 2012 - Apache Mahout 0.6 released

Highlights include:

  • Improved Decision Tree performance and added support for regression problems
  • New LDA implementation using Collapsed Variational Bayes 0th Derivative Approximation
  • Reduced runtime of LanczosSolver tests
  • K-Trusses, Top-Down and Bottom-Up clustering, Random Walk with Restarts implementation
  • Reduced runtime of dot product between vectors
  • Added MongoDB and Cassandra DataModel support
  • Increased efficiency of parallel ALS matrix factorization
  • SSVD enhancements
  • Performance improvements in RowSimilarityJob, TransposeJob
  • Added numerous clustering display examples
  • Many bug fixes, refactorings, and other small improvements

Changes in 0.6 are detailed here.

Past Releases