Description
This R package computes uncertainty for random forest predictions using a fast implementation of random forests in C++. This is an exciting time for research into the theoretical properties of random forests. This R package aims to provide all state-of-the-art variance estimates in one place, to expedite research in this area and make it easier for practitioners to compare estimates.
Two variance estimates are provided: U-statistics based (Mentch & Hooker, 2016) and infinitesimal jackknife on bootstrap samples (Wager, Hastie, Efron, 2014), the latter as a wrapper to the authors' R code randomForestCI.
Check out a demo: How Uncertain Are Your Random Forest Predictions?
Updates
More variance estimates coming soon: (1) Bootstrap-of-little-bags (Sexton and Laake 2009) (2) Infinitesimal jackknife on subsamples (Wager & Athey, 2017; Athey, Tibshirani, Wager, 2016) as a wrapper to the authors' R package grf.
This package is actively under development. Feedback, bug reports, pointers to other variance estimates very much welcome! Email me.
Installation
Download the .tar.gz (whether you are on Mac, Linux, or Windows). Note that it is a source, not binary file, and needs to be compiled using C++ development tools. If you don't already have the dependencies (Rcpp, RcppArmadillo, Matrix, knitr) and optional dependencies (randomForest, rpart) installed, install those from CRAN first. If you are on Windows, make sure you have RTools installed for C++ development tools. If you already have an older version of surfin installed, remove that first by typing the following in R:
$ remove.packages("surfin")
Then, within base R (not RStudio), install using:
$ install.packages(path_to_downloaded_file, repos=NULL, type="source")
$ library(surfin)
While surfin installation is currently incompatible with RStudio, once it is installed (using base R), it can be ran from Rstudio. Please email me if you encounter any installation issues.
Once installed, you can see surfin's help file by typing:
$ ?surfin
in R, or check out the demo.
References
Mentch L & Hooker G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research. 2016.
Wager S, Hastie T, Efron B. Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. Journal of Machine Learning Research. 2014.
Sexton J & Laake P. Standard errors for bagged and random forest estimators. Journal of Computational Statistics & Data Analysis. 2009.
Wager S & Athey S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association. 2018.
Athey S, Tibshirani, J, Wager, S. Generalized Random Forests. The Annals of Statistics. 2019.
Authors and Contributors
Sarah Tan @shftan, David Miller @d-miller, Giles Hooker @gileshooker, Lucas Mentch @LMentch
Maintainer
Sarah Tan. Email me.