De-Biased Random Forest Variable Selection

24 Pages Posted: 22 Dec 2011 Last revised: 18 Feb 2013

Date Written: December 22, 2011

Abstract

This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.

Keywords: random forest, variable importance, interaction effects, logistic regression, interaction effects, predictive modeling, biases

Suggested Citation

Sharma, Dhruv, De-Biased Random Forest Variable Selection (December 22, 2011). Available at SSRN: https://ssrn.com/abstract=1975801 or http://dx.doi.org/10.2139/ssrn.1975801

Dhruv Sharma (Contact Author)

Independent ( email )

2023 N. Cleveland St.
Arlington, VA 22201
United States

HOME PAGE: http://theinterdisciplinarian.com/

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
214
Abstract Views
1,333
Rank
258,231
PlumX Metrics