What is Apache Mahout?

The Apache Mahoutâ„¢ project's goal is to build a scalable machine learning library.

Latest release version 0.9 has

  • User and Item based recommenders
  • Matrix factorization based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Latent Dirichlet Allocation
  • Singular Value Decomposition
  • Logistic regression classifier
  • (Complementary) Naive Bayes classifier
  • Random forest classifier
  • High performance java collections
  • A vibrant community

With scalable we mean:

Scalable to large data sets. Our core algorithms for clustering, classfication and collaborative filtering are implemented on top of scalable, distributed systems. However, contributions that run on a single machine are welcome as well.

Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license.

Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more.

Currently Mahout supports mainly three use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.

Interested in helping? Join the Mailing lists.

Mahout News

25 April 2014 - Goodbye MapReduce

The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.

We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.

Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout.

1 February 2014 - Apache Mahout 0.9 released

Apache Mahout has reached version 0.9. All developers are encouraged to begin using version 0.9. Highlights include:

  • New and improved Mahout website based on Apache CMS - MAHOUT-1245
  • Early implementation of a Multi Layer Perceptron (MLP) classifier - MAHOUT-1265
  • Scala DSL Bindings for Mahout Math Linear Algebra. See this blogpost and MAHOUT-1297
  • Recommenders as Search. See [https://github.com/pferrel/solr-recommender] and MAHOUT-1288
  • Support for easy functional Matrix views and derivatives - MAHOUT-1300
  • JSON output format for ClusterDumper - MAHOUT-1343
  • Enabled randomised testing for all Mahout modules using Carrot RandomizedRunner - MAHOUT-1345
  • Online Algorithm for computing accurate Quantiles using 1-dimensional Clustering - See this pdf and MAHOUT-1361
  • Upgrade to Lucene 4.6.1 - MAHOUT-1364

Changes in 0.9 are detailed in the release notes.

The following algorithms that were marked deprecated in 0.8 have been removed in 0.9:

  • Switched LDA implementation from Gibbs Sampling to Collapsed Variational Bayes
  • Meanshift - removed due to lack of actual usage and support
  • MinHash - removed due to lack of actual usage and support
  • Winnow - removed due to lack of actual usage and support
  • Perceptron - removed due to lack of actual usage and support
  • Slope One - removed due to lack of actual usage
  • Distributed Pseudo recommender - removed due to lack of actual usage
  • TreeClusteringRecommender - removed due to lack of actual usage

25 July 2013 - Apache Mahout 0.8 released

Visit our release notes page for details.

16 June 2012 - Apache Mahout 0.7 released

Visit our release notes page for details.

6 Feb 2012 - Apache Mahout 0.6 released

Visit our release notes page for details.

9 Oct 2011 - Mahout in Action released

The book Mahout in Action is available in print. Sean Owen, Robin Anil, Ted Dunning and Ellen Friedman thank the community (especially those who were reviewers) for input during the process and hope it is enjoyable.

Find it at your favorite bookstore, or order print and eBook copies from Manning -- use discount code "mahout37" for 37% off.