Variance Reduction in Actor Critic Methods (ACM)

9 Pages Posted: 24 Jul 2019

See all articles by Eric Benhamou

Eric Benhamou

Université Paris Dauphine; AI For Alpha; EB AI Advisory; Université Paris-Dauphine, PSL Research University

Date Written: June 23, 2019

Abstract

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the L2 norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

Suggested Citation

Benhamou, Eric, Variance Reduction in Actor Critic Methods (ACM) (June 23, 2019). Available at SSRN: https://ssrn.com/abstract=3424668 or http://dx.doi.org/10.2139/ssrn.3424668

Eric Benhamou (Contact Author)

Université Paris Dauphine ( email )

Place du Maréchal de Tassigny
Paris, Cedex 16 75775
France

AI For Alpha ( email )

35 boulevard d'Inkermann
Neuilly sur Seine, 92200
France

EB AI Advisory ( email )

35 Boulevard d'Inkermann
Neuilly sur Seine, 92200
France

Université Paris-Dauphine, PSL Research University ( email )

Place du Maréchal de Lattre de Tassigny
Paris, 75016
France

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
4,705
Abstract Views
123,874
Rank
3,739
PlumX Metrics