A Big Data Approach to Analyzing Market Volatility

28 Pages Posted: 7 Jun 2013

See all articles by Kesheng Wu

Kesheng Wu

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab)

Wes Bethel

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab)

Ming Gu

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab)

David Leinweber

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab)

Oliver Ruebel

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab)

Date Written: June 5, 2013

Abstract

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time -- an ability that could be valuable to regulators.

Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.

Keywords: high-performance computing, market microstructure, probability of informed trading, VPIN, liquidity, flow toxicity, volume imbalance, flash crash

JEL Classification: C02, D52, D53, G14, G23

Suggested Citation

Wu, Kesheng and Bethel, Wes and Gu, Ming and Leinweber, David and Ruebel, Oliver, A Big Data Approach to Analyzing Market Volatility (June 5, 2013). Algorithmic Finance (2013), 2:3-4, 241-267, Available at SSRN: https://ssrn.com/abstract=2274991 or http://dx.doi.org/10.2139/ssrn.2274991

Kesheng Wu (Contact Author)

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab) ( email )

1 Cyclotron Road
Berkeley, CA 94720
United States

Wes Bethel

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab) ( email )

1 Cyclotron Road
Berkeley, CA 94720
United States

Ming Gu

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab) ( email )

1 Cyclotron Road
Berkeley, CA 94720
United States

David Leinweber

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab) ( email )

1 Cyclotron Road
Berkeley, CA 94720
United States

Oliver Ruebel

University of California, Berkeley - Lawrence Berkeley National Laboratory (Berkeley Lab) ( email )

1 Cyclotron Road
Berkeley, CA 94720
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
1,678
Abstract Views
11,937
Rank
19,515
PlumX Metrics