source : [3](3.html)

LearnUnprunedTree(*X*,*Y*)

Input: *X* a matrix of *R* rows and *M* columns where *X{*}{*}{~}ij{~}* =
the value of the *j*‘th attribute in the *i*‘th input datapoint. Each
column consists of either all real values or all categorical values.
Input: *Y* a vector of *R* elements, where *Y{*}{*}{~}i{~}* = the output
class of the *i*‘th datapoint. The *Y{*}{*}{~}i{~}* values are categorical.
Output: An Unpruned decision tree

If all records in *X* have identical values in all their attributes (this
includes the case where *R<2*), return a Leaf Node predicting the majority
output, breaking ties randomly. This case also includes
If all values in *Y* are the same, return a Leaf Node predicting this value
as the output
Else
select *m* variables at random out of the *M* variables
For *j* = 1 .. *m*
If *j*‘th attribute is
categorical
*
IG{*}{*}{~}j{~}* = IG(*Y*|*X{*}{*}{~}j{~}*) (see Information
Gain)
Else (*j*‘th attribute is
real-valued)
*
IG{*}{*}{~}j{~}* = IG*(*Y*|*X{*}{*}{~}j{~}*) (see Information Gain)
Let *j** = argmax{~}j~ *IG{*}{*}{~}j{~}* (this is the
splitting attribute we’ll use)
If *j** is categorical then
For each value *v* of the *j*‘th
attribute
Let
*X{*}{*}{^}v{^}* = subset of rows of *X* in which *X{*}{*}{~}ij{~}* = *v*.
Let *Y{*}{*}{^}v{^}* = corresponding subset of *Y*
Let *Child{*}{*}{^}v{^}* =
LearnUnprunedTree(*X{*}{*}{^}v{^}*,*Y{*}{*}{^}v{^}*)
Return a decision tree node,
splitting on *j*‘th attribute. The number of children equals the number of
values of the *j*‘th attribute, and the *v*‘th child is
*Child{*}{*}{^}v{^}*
Else *j** is real-valued and let *t* be the best split
threshold
Let *X{*}{*}{^}LO{^}* = subset
of rows of *X* in which *X{*}{*}{~}ij{~}* *<= t*. Let *Y{*}{*}{^}LO{^}* =
corresponding subset of *Y*
Let *Child{*}{*}{^}LO{^}* =
LearnUnprunedTree(*X{*}{*}{^}LO{^}*,*Y{*}{*}{^}LO{^}*)
Let *X{*}{*}{^}HI{^}* = subset of rows of *X*
in which *X{*}{*}{~}ij{~}* *> t*. Let *Y{*}{*}{^}HI{^}* = corresponding
subset of *Y*
Let *Child{*}{*}{^}HI{^}* =
LearnUnprunedTree(*X{*}{*}{^}HI{^}*,*Y{*}{*}{^}HI{^}*)
Return a decision tree node, splitting on
*j*‘th attribute. It has two children corresponding to whether the *j*‘th
attribute is above or below the given threshold.

*Note*: There are alternatives to Information Gain for splitting nodes

source : [3](3.html)

- h4. nominal attributes

suppose X can have one of m values V{~}1~,V{~}2~,…,V{~}m~ P(X=V{~}1~)=p{~}1~, P(X=V{~}2~)=p{~}2~,…,P(X=V{~}m~)=p{~}m~ H(X)= -sum{~}j=1{~}{^}m^ p{~}j~ log{~}2~ p{~}j~ (The entropy of X) H(Y|X=v) = the entropy of Y among only those records in which X has value v H(Y|X) = sum{~}j~ p{~}j~ H(Y|X=v{~}j~) IG(Y|X) = H(Y) - H(Y|X)

- h4. real-valued attributes

suppose X is real valued define IG(Y|X:t) as H(Y) - H(Y|X:t) define H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t) define IG*(Y|X) = max{~}t~ IG(Y|X:t)

source : [1](1.html)

Each tree is grown as follows:

- if the number of cases in the training set is
*N*, sample*N*cases at random -but with replacement, from the original data. This sample will be the training set for the growing tree. - if there are
*M*input variables, a number*m « M*is specified such that at each node,*m*variables are selected at random out of the*M*and the best split on these*m*is used to split the node. The value of*m*is held constant during the forest growing. - each tree is grown to its large extent possible. There is no pruning.

source : [2](2.html)
Random Forests are easy to use, the only 2 parameters a user of the
technique has to determine are the number of trees to be used and the
number of variables (*m*) to be randomly selected from the available set of
variables.
Breinman’s recommendations are to pick a large number of trees, as well as
the square root of the number of variables for *m*.

Classify(*node*,*V*)
Input: *node* from the decision tree, if *node.attribute
= j* then the split is done on the *j*‘th attribute

Input: *V* a vector of *M* columns where
*V{*}{*}{~}j{~}* = the value of the *j*‘th attribute.
Output: label of *V*

If *node* is a Leaf then
Return the value predicted
by *node*

Else
Let *j =
node.attribute*
If *j* is
categorical then
Let *v* = *V{*}{*}{~}j{~}*
Let *child{*}{*}{^}v{^}* = child node corresponding to the attribute’s
value *v*
Return Classify(*child{*}{*}{^}v{^}*,*V*)

Else *j* is
real-valued
Let *t = node.threshold* (split threshold)
If Vj < t then
Let *child{*}{*}{^}LO{^}* = child
node corresponding to (*<t*)
Return
Classify(*child{*}{*}{^}LO{^}*,*V*)
Else
Let *child{*}{*}{^}HI{^}* =
child node corresponding to (*>=t*)
Return
Classify(*child{*}{*}{^}HI{^}*,*V*)

source : [1](1.html)

in random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:

- each tree is constructed using a different bootstrap sample from the
original data. About one-third of the cases left of the bootstrap sample
and not used in the construction of the
*kth*tree. - put each case left out in the construction of the
*kth*tree down the*kth{*}tree to get a classification. In this way, a test set classification is obtained for each case in about one-thrid of the trees. At the end of the run, take*j*to be the class that got most of the the votes every time case*n*was*oob*. The proportion of times that*j*is not equal to the true class of*n*averaged over all cases is the*oob error estimate*. This has proven to be unbiased in many tests.

source : [1](1.html)

- variable importance
- gini importance
- proximities
- scaling
- prototypes
- missing values replacement for the training set
- missing values replacement for the test set
- detecting mislabeled cases
- detecting outliers
- detecting novelties
- unsupervised learning
- balancing prediction error Please refer to [1](1.html) for a detailed description

[1](1.html) Random Forests - Classification Description http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm [2](2.html) B. Larivi�re & D. Van Den Poel, 2004. “Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques,” Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 04/282, Ghent University, Faculty of Economics and Business Administration. Available online : http://ideas.repec.org/p/rug/rugwps/04-282.html [3](3.html) Decision Trees - Andrew W. Moore[4] http://www.cs.cmu.edu/~awm/tutorials[1](1.html) [4](4.html) Information Gain - Andrew W. Moore http://www.cs.cmu.edu/~awm/tutorials