Package org.apache.mahout.classifier.df.mapreduce.inmem

In-memory mapreduce implementation of Random Decision Forests

See: Description

Package org.apache.mahout.classifier.df.mapreduce.inmem Description

In-memory mapreduce implementation of Random Decision Forests

Each mapper is responsible for growing a number of trees with a whole copy of the dataset loaded in memory, it uses the reference implementation's code to build each tree and estimate the oob error.

The dataset is distributed to the slave nodes using the DistributedCache. A custom InputFormat (InMemInputFormat) is configured with the desired number of trees and generates a number of InputSplits equal to the configured number of maps.

There is no need for reducers, each map outputs (the trees it built and, for each tree, the labels the tree predicted for each out-of-bag instance. This step has to be done in the mapper because only there we know which instances are o-o-b.

The Forest builder (InMemBuilder) is responsible for configuring and launching the job. At the end of the job it parses the output files and builds the corresponding DecisionForest.

Copyright © 2008–2015 The Apache Software Foundation. All rights reserved.