I*: Optimizing Logistic Regression to Match Ensemble Performance Using Random Forest Variable Importance
75 Pages Posted: 6 Jun 2011 Last revised: 6 Sep 2011
Date Written: June 5, 2011
Abstract
An automated directed search procedure called interaction miner or I* is outlined as an entity which allows logistic regression models to be built automatically based on theory suggested by random forest variable importance measures of predictive value of attributes. The fact that interaction effects can be added to regression models using intelligent directed information show that predictive models can be built without art and with science. It is unclear how important this is, but it appears ensemble methods derive their power by extracting information about interaction effects in data. Once this is accounted for regression models can match or outperform random forests. Tuning regression to outperform ensemble methods is the goal of this algorithm. It is shown to work on 3 credit data sets. This is an automated heuristic approach based on the observations in various credit and behavioral data sets that out of the box random forest outperforms logistic regression but after tuning based on random forest variable importance logistic regression can be tuned to match or outperform random forest models by adding interaction terms.
Keywords: logistic regression, random forest, interaction mining, variable selection, automated model building,ensemble performance regression
Suggested Citation: Suggested Citation