Incremental Random Forest

An implementation in C++ (with node.js and Python bindings) of a variant of Leo Breiman's Random Forests

The forest is maintained incrementally as samples are added or removed - rather than fully rebuilt from scratch every time - to save effort.

It is not a streaming implementation, all the samples are stored and will be reseen when required to recursively rebuild invalidated subtrees. The effort to update each individual tree can vary substantially but the overall effort to update the forest is averaged across the trees so tends not to vary so much.

IRF is licensed under the MIT license.

Features and limitations

Sparse feature vectors
Samples can be added, removed and changed
Learning can be performed lazily or initiated explicitly
The forest can be serialized to JSON for transmission/storage
The forest needs to fit fully in RAM, performance suffers dramatically when swapping
Currently only binary classification - 0 or 1. The classifier estimates the probability of belonging to class 1, as a float from 0 to 1
Currently only binary features: y >= 0.5 is considered 1, otherwise 0

Node.js setup

npm install irf

Node.js usage

var irf = require('irf');

var f = new irf.IRF(99); // create forest of 99 trees

f.add('1', {1:1, 3:1, 5:1}, 0); // add a sample identified as '1' with the given feature values, classified as 0
f.add('2', {1:0, 3:0, 4:1}, 0); // features are stored sparsely, when a value is not given it will be taken as 0
f.add('3', {2:0, 3:0, 5:0}, 0); // but 0s can also be given explicitly
// ...

var y = f.classify({1:1, 3:1, 5:1}); // classify feature vector
                                     // the forest will be lazily updated before classification
f.commit();                          // but you can force an update at any time
                                     // you get a probability estimate from 0 to 1 for belong to class 1
var c = Math.round(y);               // round to nearest to get class (0 or 1)

f.remove('8'); // remove a sample
f.add('8', {1:0, 2:0, 3:0, 4:0, 5:1}, 0); // and add it again with new values

console.log(f.asJSON()); // serialize to json (for classification, not suitable for incremental update)

f.each(function(suid, features, y) {
    // ...
});

var b = f.toBuffer();    // serialize (complete) to buffer
var f2 = new irf.IRF(b); // construct from buffer contents

Python setup

cd irf
python setup.py install

Python usage

import irf

f = irf.IRF(99) # create forest of 99 trees

f.add('1', {1:1, 3:1, 5:1}, 0) # add a sample identified as '1' with the given feature values, classified as 0
f.add('2', {1:0, 3:0, 4:1}, 0) # features are stored sparsely, when a value is not given it will be taken as 0
f.add('3', {2:0, 3:0, 5:0}, 0) # but 0s can also be given explicitly
# ...

y = f.classify({1:1, 2:1, 5:1}); print y, int(round(y)) # classify feature vector, round to nearest to get class

f.save('simple.rf') # save forest to file

f = irf.load('simple.rf') # load forest from file

f.remove('8') # remove a sample
f.add('8', {1:0, 2:0, 3:0, 4:0, 5:1}, 0) # and add it again with new values

y = f.classify({1:1, 2:1, 5:1}); print y, int(round(y)) # the forest will be lazily updated before classification
# f.commit() # but you can force it

for (sId, x, y) in f.samples(): # iterate through samples in the forest, in lexicographic ID order
    print sId, x, y # and print them

C++ usage

to be written

Dependencies

System:

STL

Included:

MurmurHash3 (from smhasher)

External:

google sparse hash

Tests

simple.py - trivial made up data to illustrate how to use the API
mushrooms.js, mushrooms.py - using the mushrooms dataset collected by LIBSVM from the UCI Machine Learning Repository

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
irf		irf
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
MIT-LICENSE.txt		MIT-LICENSE.txt
README.md		README.md
binding.gyp		binding.gyp
index.js		index.js
libsparsehash.pc		libsparsehash.pc
package.json		package.json
wscript		wscript

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incremental Random Forest

Features and limitations

Node.js setup

Node.js usage

Python setup

Python usage

C++ usage

Dependencies

Tests

About

Releases

Packages

Contributors 2

Languages

License

pconstr/irf

Folders and files

Latest commit

History

Repository files navigation

Incremental Random Forest

Features and limitations

Node.js setup

Node.js usage

Python setup

Python usage

C++ usage

Dependencies

Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages