Learning the Minimal Representation of a Dynamic System from Transition Data

44 Pages Posted: 18 Feb 2021 Last revised: 18 Aug 2023

See all articles by Mohammed Amine Bennouna

Mohammed Amine Bennouna

Massachusetts Institue of Technology (MIT) - Operations Research Center

Dessislava Pachamanova

Babson College

Georgia Perakis

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Omar Skali Lami

Massachusetts Institute of Technology (MIT) - Operations Research Center

Date Written: January 10, 2021

Abstract

This paper proposes a framework for learning the most concise MDP model of a continuous state space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system's observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values, and therefore the optimal policy, of the dynamic system|in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration.
Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence, and show that if the class of possible partitions is of finite VC dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample PAC convergence guarantees. This insight motivates our Minimal Representation Learning (MRL) algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data.

Keywords: reinforcement learning, statistical learning, block markov decision process, discretization, interpretability, data-driven decision making, state representation learning, MDP state aggregation

Suggested Citation

Bennouna, Mohammed Amine and Pachamanova, Dessislava and Perakis, Georgia and Skali Lami, Omar, Learning the Minimal Representation of a Dynamic System from Transition Data (January 10, 2021). Available at SSRN: https://ssrn.com/abstract=3785547 or http://dx.doi.org/10.2139/ssrn.3785547

Mohammed Amine Bennouna (Contact Author)

Massachusetts Institue of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E 40-149
Cambridge, MA 02139
United States

HOME PAGE: http://https://www.mit.edu/~amineben/

Dessislava Pachamanova

Babson College ( email )

Babson Park, MA 02157
United States
781-235-1200 (Phone)
781-239-6414 (Fax)

Georgia Perakis

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-565
Cambridge, MA 02142
United States

Omar Skali Lami

Massachusetts Institute of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E 40-149
Cambridge, MA 02139
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
666
Abstract Views
1,995
Rank
73,295
PlumX Metrics