Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

This work investigates the application of Multi-Agent Deep Reinforcement Learning (MADRL) on decentralized inventory management problems with multiple echelons. Specifically, we apply Heterogeneous Agent Proximal Policy Optimization (HAPPO) to the decentralized multi-echelon inventory management problems in both a serial supply chain and a supply chain network. We provide the formulation of decentralized multiechelon inventory management problems as Partially Observable Markov Games (POMGs) and investigate the effective design of reward functions for multiple actors. We find that the optimal objective for each actor is between being fully self-interested and being fully system-focused when considering the optimization of the overall performance of the system. Our numerical results show that policies constructed by HAPPO achieve lower overall costs than policies constructed by single-agent deep reinforcement learning and other heuristic policies. Also, the upfront-only information-sharing mechanism used in MADRL contributes to a less significant bullwhip effect than policies constructed by single-agent deep reinforcement learning where information is not shared among actors. Our results provide a new perspective on the benefit of information sharing in the supply chains that helps alleviate the bullwhip effect and improve the overall performance when applying MADRL. Our results also verify MADRL’s potential in solving various multi-echelon inventory management problems with complex supply chain structures and in non-stationary market environments.

Keywords: Multi-Echelon Inventory Management, Multi-Agent Deep Reinforcement Learning, Bullwhip Effect

Suggested Citation: Suggested Citation

Liu, Xiaotian and Hu, Ming and Peng, Yijie and Yang, Yaodong, Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management (October 30, 2022). Rotman School of Management Working Paper No. 4262186, Available at SSRN: https://ssrn.com/abstract=4262186 or http://dx.doi.org/10.2139/ssrn.4262186