Surfin

Statistical Inference for Random Forests

Download .tar.gz View on GitHub

Description

This R package computes uncertainty for random forest predictions using a fast implementation of random forests in C++. This is an exciting time for research into the theoretical properties of random forests. This R package aims to provide all state-of-the-art variance estimates in one place, to expedite research in this area and make it easier for practitioners to compare estimates.

Two variance estimates are provided: U-statistics based (Mentch & Hooker, 2016) and infinitesimal jackknife on bootstrap samples (Wager, Hastie, Efron, 2014), the latter as a wrapper to the authors' R code randomForestCI.

Check out a demo: How Uncertain Are Your Random Forest Predictions?

Updates

More variance estimates coming soon: (1) Bag of little bags (Sexton and Laake 2009) (2) Infinitesimal jackknife on subsamples (Wager & Athey, 2017; Athey, Tibshirani, Wager, 2016) as a wrapper to the authors' R package grf.

This package is actively under development, and soon to be uploaded to CRAN. Feedback, bug reports, pointers to other variance estimates very much welcome! Email me.

Installation

Package not yet on CRAN. For now, download from here the .tar.gz (whether you are on Mac, Linux, or Windows). Note that it is a source, not binary file, and needs to be compiled using C++ development tools. If you don't already have the dependencies (Rcpp, RcppArmadillo, Matrix, knitr) and optional dependencies (randomForest, rpart) installed, install those from CRAN first. If you are on Windows, make sure you have RTools installed for C++ development tools. If you already have an older version of surfin installed, remove that first by typing the following in R:

$ remove.packages("surfin")
Then install using:

$ install.packages(path_to_downloaded_file, repos=NULL, type="source")
$ library(surfin)
See help file at:

$ ?surfin

Or check out the demo.

References

Mentch L & Hooker G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research. 2016.

Wager S, Hastie T, Efron B. Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. Journal of Machine Learning Research. 2014.

Sexton J & Laake P. Standard errors for bagged and random forest estimators. Journal of Computational Statistics & Data Analysis. 2009.

Wager S & Athey S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. arxiv. 2017.

Athey S, Tibshirani, J, Wager, S. Generalized Random Forests. arxiv. 2016.

Authors and Contributors

Sarah Tan @shftan, David Miller @d-miller, Giles Hooker @gileshooker, Lucas Mentch @LMentch

Maintainer

Sarah Tan. Email me or submit a request at http://github.com/shftan/surfin/issues.