Stata, Fast and Slow: Why Running Many Small Regressions in a Large Dataset Takes So Long; and What to Do About It

19 Pages Posted: 12 Apr 2014

See all articles by Paul Geertsema

Paul Geertsema

University of Auckland Business School

Date Written: April 11, 2014

Abstract

Stata is fast, often very fast. However, when performing regressions on small sub-samples within a large host dataset (more than 1 million observations) performance can deteriorate by many orders of magnitude. For example, an OLS regression on a sub-sample of 100 consecutive observations takes 3.6 seconds in a host dataset with 1 billion observations, but only 3.8 milliseconds in a host dataset with 1000 observations. The difference in performance is due to the mechanism regress uses to mark estimation samples. This performance deterioration has practical implications in finance research, where many variables of interest are themselves estimated via millions of individual OLS regressions within large panel datasets. I suggest an approach that circumvents this issue by using a simple Mata implementation of regress which I call fastreg. As a test, I estimate daily Fama and French 3-factor betas for individual stocks in the CRSP database from 1923 to 2013 using a 250-day rolling window. In this setting fastreg is approximately 367 times faster than regress. The code for fastreg ado is included in the Appendix and is open-source licensed under the GNU GPL.

Keywords: Stata, statistical computing, large datasets, rolling window regressions, factor beta estimation

JEL Classification: C55, C58, C80, C87

Suggested Citation

Geertsema, Paul G., Stata, Fast and Slow: Why Running Many Small Regressions in a Large Dataset Takes So Long; and What to Do About It (April 11, 2014). Available at SSRN: https://ssrn.com/abstract=2423171 or http://dx.doi.org/10.2139/ssrn.2423171

Paul G. Geertsema (Contact Author)

University of Auckland Business School ( email )

12 Grafton Rd
Private Bag 92019
Auckland, 1010
New Zealand

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
819
Abstract Views
4,279
Rank
45,628
PlumX Metrics