The Mahout Collections library is a set of container classes that address
some limitations of the standard collections in Java. This presentation
describes a number of performance problems with the standard collections.
Mahout collections addresses two of the more glaring: the lack of support
for primitive types and the lack of open hashing.
The most visible feature of Mahout Collections is the large collection of
primitive type collections. Given Java’s asymmetrical support for the
primitive types, the only efficient way to handle them is with many
classes. So, there are ArrayList-like containers for all of the primitive
types, and hash maps for all the useful combinations of primitive type and
object keys and values.
These classes do not, in general, implement interfaces from java.util.
Even when the java.util interfaces could be type-compatible, they tend
to include requirements that are not consistent with efficient use of
All of the sets and maps in Mahout Collections are open-addressed hash
tables. Open addressing has a much smaller memory footprint than chaining.
Since the purpose of these collections is to avoid the memory cost of
autoboxing, open addressing is a consistent design choice.
Mahout Collections includes open hash sets. Unlike java.util, a set is
not a recycled hash table; the sets are separately implemented and do not
have any additional storage usage for unused keys.
Credit where Credit is due
The implementation of Mahout Collections is derived from Cern Colt