Release Notes

11 April 2016 - Apache Mahout 0.12.0 released

This release marks a major milestone for the “Samsara” environment’s goal of providing an engine neutral math platform by now supporting Apache Flink. While still experimental, the mahout Flink bindings now offer all of the R-Like semantics for linear algebra operations, matrix decompositions, and algorithms of the “Samsara” platform for execution on a Flink back-end.

This release gives users of Apache Flink out of the box access to the following features (and more):

  1. The Mahout Distributed Row Matrix (DRM) API.
  2. Distributed and local Vector and Matrix algebra routines.
  3. Distributed and local Stochastic Principal Component Analysis.
  4. Distributed and local Stochastic Singular Value Decomposition.
  5. Distributed and local Thin QR Decomposition.
  6. Collaborative Filtering.
  7. Naive Bayes Classification.
  8. Matrix operations (only listing a few here):
    1. Mahout-native blockified distributed Matrix map and allreduce routines.
    2. Distributed data point (row) sampling.
    3. Matrix/Matrix Squared Distance.
    4. Element-wise log.
    5. Element-wise roots.
    6. Element-wise Matrix/Matrix addition, subtraction, division and multiplication.
    7. Functional Matrix value assignment.
    8. A familiar Scala-based R-like DSL.
    </ol> #### 11 March 2016 - Apache Mahout 0.11.2 released This is a minor release over Mahout 0.11.1 meant to introduce major performance enhancements with sparse matrix and vector computations, and major performance optimizations to the Samsara DSL. Mahout 0.11.2 includes all new features and bug fixes released in Mahout versions 0.11.0 and 0.11.1. Highlights include: * Spark 1.5.2 support * Performance improvements of over 30% on Sparse Vector and Matrix computations leveraging the ‘fastutil’ library - contribution from Sebastiano Vigna. This speeds up all in-core sparse vector and matrix computations. #### 06 November 2015 - Apache Mahout 0.11.1 released This is a minor release over Mahout 0.11.0 meant to expand Mahout’s compatibility with Spark versions, to introduce some new features and to fix some bugs. Mahout 0.11.1 includes all new features and bug fixes released in Mahout versions 0.11.0 and earlier. Highlights include: * Spark 1.4+ support * 4x Performance improvement in Dot Product over Dense Vectors ( #### 07 August 2015 - Apache Mahout 0.11.0 released Mahout 0.11.0 includes all new features and bugfixes released in Mahout versions 0.10.1 and 0.10.2 along with support for Spark 1.3+. Highlights include: * Spark 1.3 support * Fixes for a major memory usage bug in co-occurrence analysis used by the driver spark-itemsimilarity. This will now require far less memory in the executor. * Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops. * All of the Mahout Samsara fixes from 0.10.2 Release #### 06 August 2015 - Apache Mahout 0.10.2 released Highlights include: * In-core transpose view rewrites. Modifiable transpose views eg. (for (col <- a.t) col := 5). * Performance and parallelization improvements for AB', A'B, A'A spark physical operators. * Optional structural "flavor" abstraction for in-core matrices. In-core matrices can now be tagged as e.g. sparse or dense. * %*% optimization based on matrix flavors. * In-core ::= sparse assignment functions. * Assign := optimization (do proper traversal based on matrix flavors, similarly to %*%). * Adding in-place elementwise functional assignment (e.g. mxA := exp _, mxA ::= exp _). * Distributed and in-core version of simple elementwise analogues of scala.math._. for example, for log(x) the convention is dlog(drm), mlog(mx), vlog(vec). Unfortunately we cannot overload these functions over what is done in scala.math, i.e. scala would not allow log(mx) or log(drm) and log(Double) at the same time, mainly because they are being defined in different packages. * Distributed and in-core first and second moment routines. R analogs: mean(), colMeans(), rowMeans(), variance(), sd(). By convention, distributed versions are prepended by (d) letter: colMeanVars() colMeanStdevs() dcolMeanVars() dcolMeanStdevs(). * Distance and squared distance matrix routines. R analog: dist(). Provide both squared and non-squared Euclidean distance matrices. By convention, distributed versions are prepended by (d) letter: dist(x), sqDist(x), dsqDist(x). Also a variation for pair-wise distance matrix of two different inputs x and y: sqDist(x,y), dsqDist(x,y). * DRM row sampling api. * Distributed performance bug fixes. This relates mostly to (a) matrix multiplication deficiencies, and (b) handling parallelism. * Distributed engine neutral allreduceBlock() operator api for Spark and H2O. * Distributed optimizer operators for elementwise functions. Rewrites recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise physical operator: elementwiseFunc(f1(f2(drmX)) where f1 = 1 + x and f2 = exp(x). * More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other way around) for Spark and H2O. * Added +=: and *=: operators on vectors. * Closeable API for broadcast tensors. * Support for conversion of any type-keyed DRM into ordinally-keyed DRM. * Scala logging style. * rowSumsMap() summary for non-int-keyed DRMs. * elementwise power operator ^ . * R-like vector concatenation operator. * In-core functional assignments e.g.: mxA := { (x) => x * x}. * Straighten out behavior of Matrix.iterator() and iterateNonEmpty(). * New mutable transposition view for in-core matrices. In-core matrix transpose view. rewrite with mostly two goals in mind: (1) enable mutability, e.g. for (col <- mxA.t) col := k (2) translate matrix structural flavor for optimizers correctly. i.e. new SparseRowMatrix.t carries on as column-major structure. * Native support for kryo serialization of tensor types. * Deprecation of the MultiLayerPerceptron, ConcatenateVectorsJob and all related classes. * Deprecation of SparseColumnMatrix. #### 31 May 2015 - Apache Mahout 0.10.1 released Highlights include: * Major memory use improvements in cooccurrence analysis including the spark-itemsimilarity driver [MAHOUT-1707]( * Support for Spark version 1.2.2 or less. * Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops. * Trim down packages size to < 200MB MAHOUT-1704 and MAHOUT-1706 * Minor testing indicates binary compatibility with Spark 1.3 with the exception of the Mahout Shell. #### 11 April 2015 - Apache Mahout 0.10.0 released Mahout 0.10.0 was a major release, which separates out a ML environment (we call Mahout-Samsara) including an extended version of Scala that is largely backend independent but runs fully on Spark. The Hadoop MapReduce versions of Mahout algorithms are still maintained but no new MapReduce contributions are accepted. From this release onwards contributions must be Mahout Samsara based or at least run on Spark. Highlights include: New Mahout Samsara Environment * Distributed Algebraic optimizer * R-Like DSL Scala API * Linear algebra operations * Ops are extensions to Scala * Scala REPL based interactive shell running on Spark * Integrates with compatible libraries like MLlib * Run on distributed Spark * H2O in progress New Mahout Samsara based Algorithms * Stochastic Singular Value Decomposition (ssvd, dssvd) * Stochastic Principal Component Analysis (spca, dspca) * Distributed Cholesky QR (thinQR) * Distributed regularized Alternating Least Squares (dals) * Collaborative Filtering: Item and Row Similarity * Naive Bayes Classification * Distributed and in-core Changes in 0.10.0 are detailed here #### 1 February 2014 - Apache Mahout 0.9 released

    Highlights include:

    • New and improved Mahout website based on Apache CMS - MAHOUT-1245
    • Early implementation of a Multi Layer Perceptron (MLP) classifier - MAHOUT-1265.
    • Scala DSL Bindings for Mahout Math Linear Algebra. See this blogpost - MAHOUT-1297
    • Recommenders as a Search. See - MAHOUT-1288
    • Support for easy functional Matrix views and derivatives - MAHOUT-1300
    • JSON output format for ClusterDumper - MAHOUT-1343
    • Enable randomised testing for all Mahout modules using Carrot RandomizedRunner - MAHOUT-1345
    • Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering - MAHOUT-1361. See this pdf for the details.
    • Upgrade to Lucene 4.6.1 - MAHOUT-1364
    • </ul>

      Changes in 0.9 are detailed here.

      #### 25 July 2013 - Apache Mahout 0.8 released

      Highlights include:

      • Numerous performance improvements to Vector and Matrix implementations, API's and their iterators
      • Numerous performance improvements to the recommender implementations
      • MAHOUT-1088: Support for biased item-based recommender
      • MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases
      • MAHOUT-1106: Support for SVD++
      • MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1.
      • MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering
      • MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job.
      • MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values).
      • MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices.
      • MAHOUT-1187: Upgraded to CommonsLang3
      • MAHOUT-916: Speedup the Mahout build by making tests run in parallel.

      Changes in 0.8 are detailed here.

      #### 16 June 2012 - Apache Mahout 0.7 released

      Highlights include:

      • Outlier removal capability in K-Means, Fuzzy K, Canopy and Dirichlet Clustering
      • New Clustering implementation for K-Means, Fuzzy K, Canopy and Dirichlet using Cluster Classifiers
      • Collections and Math API consolidated
      • (Complementary) Naive Bayes refactored and cleaned
      • Watchmaker and Old Naive Bayes dropped.
      • Many bug fixes, refactorings, and other small improvements

      Changes in 0.7 are detailed here.

      #### 6 Feb 2012 - Apache Mahout 0.6 released

      Highlights include:

      • Improved Decision Tree performance and added support for regression problems
      • New LDA implementation using Collapsed Variational Bayes 0th Derivative Approximation
      • Reduced runtime of LanczosSolver tests
      • K-Trusses, Top-Down and Bottom-Up clustering, Random Walk with Restarts implementation
      • Reduced runtime of dot product between vectors
      • Added MongoDB and Cassandra DataModel support
      • Increased efficiency of parallel ALS matrix factorization
      • SSVD enhancements
      • Performance improvements in RowSimilarityJob, TransposeJob
      • Added numerous clustering display examples
      • Many bug fixes, refactorings, and other small improvements

      Changes in 0.6 are detailed here.

      #### Past Releases * [Mahout 0.5](|20f0d06214912accbd47acf2f0a89231ed00a767|lin) * [Mahout 0.4](|20f0d06214912accbd47acf2f0a89231ed00a767|lin) * [Mahout 0.3](|20f0d06214912accbd47acf2f0a89231ed00a767|lin) * [Mahout 0.2](|20f0d06214912accbd47acf2f0a89231ed00a767|lin) * [Mahout 0.1](