Learning the Minimal Representation of a Dynamic System from Transition Data
44 Pages Posted: 18 Feb 2021 Last revised: 18 Aug 2023
Date Written: January 10, 2021
Abstract
This paper proposes a framework for learning the most concise MDP model of a continuous state space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system's observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values, and therefore the optimal policy, of the dynamic system|in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration.
Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence, and show that if the class of possible partitions is of finite VC dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample PAC convergence guarantees. This insight motivates our Minimal Representation Learning (MRL) algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data.
Keywords: reinforcement learning, statistical learning, block markov decision process, discretization, interpretability, data-driven decision making, state representation learning, MDP state aggregation
Suggested Citation: Suggested Citation