Learning the Minimal Representation of a Dynamic System from Transition Data

This paper proposes a framework for learning the most concise MDP model of a continuous state space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system's observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values, and therefore the optimal policy, of the dynamic system|in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration.
Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence, and show that if the class of possible partitions is of finite VC dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample PAC convergence guarantees. This insight motivates our Minimal Representation Learning (MRL) algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data.

Keywords: reinforcement learning, statistical learning, block markov decision process, discretization, interpretability, data-driven decision making, state representation learning, MDP state aggregation

Suggested Citation: Suggested Citation

Bennouna, Mohammed Amine and Pachamanova, Dessislava and Perakis, Georgia and Skali Lami, Omar, Learning the Minimal Representation of a Dynamic System from Transition Data (January 10, 2021). Available at SSRN: https://ssrn.com/abstract=3785547 or http://dx.doi.org/10.2139/ssrn.3785547

Mohammed Amine Bennouna (Contact Author)

Massachusetts Institue of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E 40-149
Cambridge, MA 02139
United States

HOME PAGE: http://https://www.mit.edu/~amineben/

Dessislava Pachamanova

Babson College ( email )

Babson Park, MA 02157
United States
781-235-1200 (Phone)
781-239-6414 (Fax)

Georgia Perakis

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-565
Cambridge, MA 02142
United States

Omar Skali Lami

Massachusetts Institute of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E 40-149
Cambridge, MA 02139
United States

Download This Paper

Open PDF in Browser

Do you have negative results from your research you’d like to share?

Submit Negative Results

Paper statistics

Downloads

666

Abstract Views

1,995

Rank

73,295

1 Citations

63 References

PlumX Metrics

Feedback

Learning the Minimal Representation of a Dynamic System from Transition Data

Abstract

Massachusetts Institue of Technology (MIT) - Operations Research Center ( email )

Babson College ( email )

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

Massachusetts Institute of Technology (MIT) - Operations Research Center ( email )

Do you have negative results from your research you’d like to share?

Paper statistics

Related eJournals

Artificial Intelligence eJournal

Computation Theory eJournal