Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license.

{toc:style=disc minlevel=2}

General

Overview – Mahout? What’s that supposed to be?

Quickstart – learn how to quickly setup Apache Mahout for your project.

FAQ – Frequent questions encountered on the mailing lists.

Developer Resources – overview of the Mahout development infrastructure.

How To Contribute – get involved with the Mahout community.

How To Become A Committer – become a member of the Mahout development community.

Hadoop – several of our implementations depend on Hadoop.

Machine Learning Open Source Software – other projects implementing Open Source Machine Learning libraries.

Mahout – The name, history and its pronunciation

Community

Who we are – who are the developers behind Apache Mahout?

Books, Tutorials, Talks, Articles, News, Background Reading, etc. on Mahout

Issue Tracker – see what features people are working on, submit patches and file bugs.

Source Code (SVN) – [Fisheye|http://fisheye6.atlassian.com/browse/mahout] – download the Mahout source code from svn.

Mailing lists and IRC – links to our mailing lists, IRC channel and archived design and algorithm discussions, maybe your questions was answered there already?

Version Control – where we track our code.

Powered By Mahout – who is using Mahout in production?

Professional Support – who is offering professional support for Mahout?

Mahout and Google Summer of Code – All you need to know about Mahout and GSoC.

Glossary of commonly used terms and abbreviations

Installation/Setup

System Requirements – what do you need to run Mahout?

Quickstart – get started with Mahout, run the examples and get pointers to further resources.

Downloads – a list of Mahout releases.

Download and installation – build Mahout from the sources.

Mahout on Amazon’s EC2 Service – run Mahout on Amazon’s EC2.

Mahout on Amazon’s EMR – Run Mahout on Amazon’s Elastic Map Reduce

Integrating Mahout into an Application – integrate Mahout’s capabilities in your application.

Examples

  1. ASF Email Examples – Examples of recommenders, clustering and classification all using a public domain collection of 7 million emails.

Implementation Background

Requirements and Design

Matrix and Vector Needs – requirements for Mahout vectors.

Collection(De-)Serialization

Collections and Algorithms

Learn more about mahout-collections , containers for efficient storage of primitive-type data and open hash tables.

Learn more about the Algorithms discussed and employed by Mahout.

Learn more about the Mahout recommender implementation .

Utilities

This section describes tools that might be useful for working with Mahout.

Converting Content – Mahout has some utilities for converting content such as logs to formats more amenable for consumption by Mahout. Creating Vectors – Mahout’s algorithms operate on vectors. Learn more on how to generate these from raw data. Viewing Result – How to visualize the result of your trained algorithms.

Data

Collections – To try out and test Mahout’s algorithms you need training data. We are always looking for new training data collections.

Benchmarks

Mahout Benchmarks

Committer’s Resources

Project Resources

Additional Resources

How To Edit This Wiki

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

There are some conventions used on the Mahout wiki:

* {noformat}+*TODO:*+{noformat} (+*TODO:*+ ) is used to denote sections that definitely need to be cleaned up.
* {noformat}+*Mahout_(version)*+{noformat} (+*Mahout_0.2*+) is used to draw attention to which version of Mahout a feature was (or will be) added to Mahout.