Class Summary
Class	Description
AbstractByteList	Abstract base class for resizable lists holding `byte` elements; abstract.
AbstractCharList	Abstract base class for resizable lists holding `char` elements; abstract.
AbstractDoubleList	Abstract base class for resizable lists holding `double` elements; abstract.
AbstractFloatList	Abstract base class for resizable lists holding `float` elements; abstract.
AbstractIntList	Abstract base class for resizable lists holding `int` elements; abstract.
AbstractList	Abstract base class for resizable lists holding objects or primitive data types such as `int`, `float`, etc.
AbstractLongList	Abstract base class for resizable lists holding `long` elements; abstract.
AbstractObjectList<T>	Abstract base class for resizable lists holding objects or primitive data types such as `int`, `float`, etc.First see the package summary and javadoc tree view to get the broad picture.
AbstractShortList	Abstract base class for resizable lists holding `short` elements; abstract.
ByteArrayList	Resizable list holding `byte` elements; implemented with arrays.
CharArrayList	Resizable list holding `char` elements; implemented with arrays.
DoubleArrayList	Resizable list holding `double` elements; implemented with arrays.
FloatArrayList	Resizable list holding `float` elements; implemented with arrays.
IntArrayList	Resizable list holding `int` elements; implemented with arrays.
LongArrayList	Resizable list holding `long` elements; implemented with arrays.
ObjectArrayList<T>	Resizable list holding `${valueType}` elements; implemented with arrays.
ShortArrayList	Resizable list holding `short` elements; implemented with arrays.
SimpleLongArrayList	Resizable list holding `long` elements; implemented with arrays; not efficient; just to demonstrate which methods you must override to implement a fully functional list.

Package org.apache.mahout.math.list Description

Resizable lists holding objects or primitive data types such as int, double, etc. For non-resizable lists (1-dimensional matrices) see package org.apache.mahout.math.matrix.

Getting Started

1. Overview

The list package offers flexible object oriented abstractions modelling dynamically resizing lists holding objects or primitive data types such as int, double, etc. It is designed to be scalable in terms of performance and memory requirements.

Features include:

Lists operating on objects as well as all primitive data types such as int, double, etc.
Compact representations
A number of general purpose list operations including: adding, inserting, removing, iterating, searching, sorting, extracting ranges and copying. All operations are designed to perform well on mass data.
Support for quick access to list elements. This is achieved by bounds-checking and non-bounds-checking accessor methods as well as zero-copy transformations to primitive arrays such as int[], double[], etc.
Allows to use high level algorithms on primitive data types without any space and time overhead. Operations on primitive arrays, Colt lists and JAL algorithms can freely be mixed at zero copy overhead.

File-based I/O can be achieved through the standard Java built-in serialization mechanism. All classes implement the Serializable interface. However, the toolkit is entirely decoupled from advanced I/O. It provides data structures and algorithms only.

This toolkit borrows concepts and terminology from the Javasoft Collections framework written by Josh Bloch and introduced in JDK 1.2.

2. Introduction

Lists are fundamental to virtually any application. Large scale resizable lists are, for example, used in scientific computations, simulations database management systems, to name just a few.

A list is a container holding elements that can be accessed via zero-based indexes. Lists may be implemented in different ways (most commonly with arrays). A resizable list automatically grows as elements are added. The lists of this package do not automatically shrink. Shrinking needs to be triggered by explicitly calling trimToSize() methods.

Growing policy: A list implemented with arrays initially has a certain initialCapacity - per default 10 elements, but customizable upon instance construction. As elements are added, this capacity may nomore be sufficient. When a list is automatically grown, its capacity is expanded to 1.5*currentCapacity. Thus, excessive resizing (involving copying) is avoided.

Copying

Any list can be copied. A copy is equal to the original but entirely independent of the original. So changes in the copy are not reflected in the original, and vice-versa.

3. Organization of this package

Class naming follows the schema <ElementType><ImplementationTechnique>List. For example, we have a DoubleArrayList, which is a list holding double elements implemented with double[] arrays.

The classes for lists of a given value type are derived from a common abstract base class tagged Abstract<ElementType>List. For example, all lists operating on double elements are derived from AbstractDoubleList, which in turn is derived from an abstract base class tying together all lists regardless of value type, AbstractList. The abstract base classes provide skeleton implementations for all but few methods. Experimental data layouts (such as compressed, sparse, linked, etc.) can easily be implemented and inherit a rich set of functionality. Have a look at the javadoc tree view to get the broad picture.

4. Example usage

The following snippet fills a list, randomizes it, extracts the first half of the elements, sums them up and prints the result. It is implemented entirely with accessor methods.

 int s = 1000000;
AbstractDoubleList list = new DoubleArrayList();
 for (int i=0; i<s; i++) { list.add((double)i); }
 list.shuffle();
 AbstractDoubleList part = list.partFromTo(0,list.size()/2 - 1);
 double sum = 0.0;
 for (int i=0; i<part.size(); i++) { sum += part.get(i); }
 log.info(sum);

For efficiency, all classes provide back doors to enable getting/setting the backing array directly. In this way, the high level operations of these classes can be used where appropriate, and one can switch to []-array index notations where necessary. The key methods for this are public <ElementType>[] elements() and public void elements(<ElementType>[]). The former trustingly returns the array it internally keeps to store the elements. Holding this array in hand, we can use the []-array operator to perform iteration over large lists without needing to copy the array or paying the performance penalty introduced by accessor methods. Alternatively any JAL algorithm (or other algorithm) can operate on the returned primitive array. The latter method forces a list to internally hold a user provided array. Using this approach one can avoid needing to copy the elements into the list.

As a consequence, operations on primitive arrays, Colt lists and JAL algorithms can freely be mixed at zero-copy overhead.

Note that such special treatment certainly breaks encapsulation. This functionality is provided for performance reasons only and should only be used when absolutely necessary. Here is the above example in mixed notation:

 int s = 1000000;
DoubleArrayList list = new DoubleArrayList(s); // list.size()==0, capacity==s
 list.setSize(s); // list.size()==s
double[] values = list.elements();
 // zero copy, values.length==s
for (int i=0; i<s; i++) { values[i]=(double)i; }
 list.shuffle();
 double sum = 0.0;
 int limit = values.length/2;
 for (int i=0; i<limit; i++) { sum += values[i]; }
 log.info(sum);

Or even more compact using lists as algorithm objects:

 int s = 1000000;
double[] values = new double[s];
 for (int i=0; i<s; i++) { values[i]=(double)i; }
 new DoubleArrayList(values).shuffle(); // zero-copy, shuffle via back door
 double sum = 0.0;
 int limit = values.length/2;
 for (int i=0; i<limit; i++) { sum += values[i]; }
 log.info(sum);

5. Notes

The quicksorts and mergesorts are the JDK 1.2 V1.26 algorithms, modified as necessary to operate on the given data types.