Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks

In this article we introduce the term "Deep Execution" that utilize deep reinforcement learning (DRL) for optimal execution. We demonstrate two different approaches to solve for the optimal execution: (1) the deep double Q-network (DDQN), a value-based approach and (2) the proximal policy optimization (PPO) a policy-based approach, for trading and beating market benchmarks, such as the time-weighted average price (TWAP). We show that, firstly, the DRL can reach the theoretically derived optimum by acting on the environment directly. Secondly, the DRL agents can learn to capitalize on price trends (alpha signals) without directly observing the price. Finally, the DRL can take advantage of the available information to create dynamic strategies as an informed trader and thus outperform static benchmark strategies such as the TWAP.

Keywords: Algorithmic Trading, Deep Learning, Execution Algorithms, Reinforcement Learning, Optimal Execution

JEL Classification: C00

Suggested Citation: Suggested Citation

Dabérius, Kevin and Granat, Elvin and Karlsson, Patrik, Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks (April 21, 2019). Available at SSRN: https://ssrn.com/abstract=3374766 or http://dx.doi.org/10.2139/ssrn.3374766