The Official Mahout FAQ

General

  1. What is Apache Mahout?
  2. What does the name mean?
  3. How is the name pronounced?
  4. Where can I find the origins of the Mahout project?
  5. Where can I download the Mahout logo?
  6. Where can I download Mahout slide presentations?

Algorithms

  1. What algorithms are implemented in Mahout?
  2. What algorithms are missing from Mahout?
  3. Do I need Hadoop to run Mahout?

Hadoop specific questions

  1. Mahout just won’t run in parallel on my dataset. Why?

Answers

General

What is Apache Mahout?

Apache Mahout is a suite of machine learning libraries designed to be scalable and robust

What does the name mean?

The name Mahout was original chosen for it’s association with the Apache Hadoop project. A Mahout is a person who drives an elephant (hint: Hadoop’s logo is an elephant). We just wanted a name that complemented Hadoop but we see our project as a good driver of Hadoop in the sense that we will be using and testing it. We are not, however, implying that we are controlling Hadoop’s development.

Prior to coming to the ASF, those of us working on the project plan voted between Howdah – the carriage on top of an elephant and Mahout.

Where can I find the origins of the Mahout project?

See http://ml-site.grantingersoll.com for old wiki and mailing list archives (all read-only)

Mahout was started by Isabel Drost, Grant Ingersoll and Karl Wettin. It started as part of the Lucene project (see the original proposal) and went on to become a top level project in April of 2010.</p><p style="text-align: left;">The original goal was to implement all 10 algorithms from Andrew Ng’s paper "Map-Reduce for Machine Learning on Multicore"</p>

How is the name pronounced?

There are some disagreements about how to pronounce the name. Webster’s has it as muh-hout (as in “out”), but the Sanskrit/Hindi origins pronounce it as “muh-hoot”. The second pronunciation suggests a nice pun on the Hebrew word מהות meaning “essence or truth”.

See MAHOUT-335

Where can I download Mahout slide presentations?

The Books, Tutorials and Talks page contains an overview of a wide variety of presentations with links to slides where available.

Algorithms

What algorithms are implemented in Mahout?

We are interested in a wide variety of machine learning algorithms. Many of which are already implemented in Mahout. You can find a list here.

What algorithms are missing from Mahout?

There are many machine learning algorithms that we would like to have in Mahout. If you have an algorithm or an improvement to an algorithm that you would like to implement, start a discussion on our mailing list.

Do I need Hadoop to use Mahout?

There is a number of algorithm implementations that require no Hadoop dependencies whatsoever, consult the algorithms list. In the future, we might provide more algorithm implementations on platforms more suitable for machine learning such as Apache Spark

Hadoop specific questions

Mahout just won’t run in parallel on my dataset. Why?

If you are running training on a Hadoop cluster keep in mind that the number of mappers started is governed by the size of the input data and the configured split/block size of your cluster. As a rule of thumb, anything below 100MB in size won’t be split by default.