Mining Big Data Using Parsimonious Factor and Shrinkage Methods

50 Pages Posted: 17 Jul 2013

See all articles by Hyun Hak Kim

Hyun Hak Kim

Department of Economics, Kookmin University

Norman R. Swanson

Rutgers University - Department of Economics; Rutgers, The State University of New Jersey - Department of Economics

Date Written: July 15, 2013

Abstract

A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using "big data". In this paper, our over-arching question is whether such "big data" are useful for modelling low frequency macroeconomic variables such as unemployment, inflation and GDP. In particular, we analyze the predictive benefits associated with the use dimension reducing independent component analysis (ICA) and sparse principal component analysis (SPCA), coupled with a variety of other factor estimation as well as data shrinkage methods, including bagging, boosting, and the elastic net, among others. We do so by carrying out a forecasting "horse-race", involving the estimation of 28 different baseline model types, each constructed using a variety of specification approaches, estimation approaches, and benchmark econometric models; and all used in the prediction of 11 key macroeconomic variables relevant for monetary policy assessment. In many instances, we find that various of our benchmark specifications, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate more complicated nonlinear methods, and that using a combination of factor and other shrinkage methods often yields superior predictions. For example, simple averaging methods are mean square forecast error (MSFE) "best" in only 9 of 33 key cases considered. This is rather surprising new evidence that model averaging methods do not necessarily yield MSFE-best predictions. However, in order to "beat" model averaging methods, including arithmetic mean and Bayesian averaging approaches, we have introduced into our "horse-race" numerous complex new models involve combining complicated factor estimation methods with interesting new forms of shrinkage. For example, SPCA yields MSFE-best prediction models in many cases, particularly when coupled with shrinkage. This result provides strong new evidence of the usefulness of sophisticated factor based forecasting, and therefore, of the use of "big data" in macroeconometric forecasting.

Keywords: prediction, independent component analysis, sparse principal component analysis, bagging, boosting, Bayesian model averaging, ridge regression, least angle regression, elastic net and non-negative garotte

JEL Classification: C32, C53, G17

Suggested Citation

Kim, Hyun Hak and Swanson, Norman Rasmus and Swanson, Norman Rasmus, Mining Big Data Using Parsimonious Factor and Shrinkage Methods (July 15, 2013). Available at SSRN: https://ssrn.com/abstract=2294110 or http://dx.doi.org/10.2139/ssrn.2294110

Hyun Hak Kim

Department of Economics, Kookmin University ( email )

Seoul
Korea, Republic of (South Korea)

HOME PAGE: http://khdouble.googlepages.com

Norman Rasmus Swanson (Contact Author)

Rutgers, The State University of New Jersey - Department of Economics ( email )

75 Hamilton Street
New Brunswick, NJ 08901
United States
848-932-7432 (Phone)

HOME PAGE: http://econweb.rutgers.edu/nswanson/

Rutgers University - Department of Economics ( email )

NJ
United States

HOME PAGE: http://econweb.rutgers.edu/nswanson/

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
230
Abstract Views
1,900
Rank
241,240
PlumX Metrics