#Breiman Example

Introduction

This page describes how to run the Breiman example, which implements the test procedure described in Leo Breiman’s paper. The basic algorithm is as follows :

Running the Example

The current implementation is compatible with the UCI repository file format. We’ll show how to run this example on two datasets:

First, we deal with Glass Identification: download the dataset file called glass.data and store it onto your local machine. Next, we must generate the descriptor file glass.info for this dataset with the following command:

bin/mahout org.apache.mahout.classifier.df.tools.Describe -p /path/to/glass.data -f /path/to/glass.info -d I 9 N L

Substitute /path/to/ with the folder where you downloaded the dataset, the argument “I 9 N L” indicates the nature of the variables. Here it means 1 ignored (I) attribute, followed by 9 numerical(N) attributes, followed by the label (L).

Finally, we build and evaluate our random forest classifier as follows:

bin/mahout org.apache.mahout.classifier.df.BreimanExample -d /path/to/glass.data -ds /path/to/glass.info -i 10 -t 100 which builds 100 trees (-t argument) and repeats the test 10 iterations (-i argument) 

The example outputs the following results:

We can repeat this for a Sonar usecase: download the dataset file called sonar.all-data and store it onto your local machine. Generate the descriptor file sonar.info for this dataset with the following command:

bin/mahout org.apache.mahout.classifier.df.tools.Describe -p /path/to/sonar.all-data -f /path/to/sonar.info -d 60 N L

The argument “60 N L” means 60 numerical(N) attributes, followed by the label (L). Analogous to the previous case, we run the evaluation as follows:

bin/mahout org.apache.mahout.classifier.df.BreimanExample -d /path/to/sonar.all-data -ds /path/to/sonar.info -i 10 -t 100