Learning the Minimal Representation of a Continuous State-Space Markov Decision Process from Transition Data

代表（政治）状态空间过渡（遗传学）过程（计算）马尔可夫决策过程部分可观测马尔可夫决策过程国家（计算机科学）空格（标点符号）计算机科学马尔可夫链马尔可夫过程人工智能数学理论计算机科学马尔可夫模型数理经济学机器学习算法统计政治学生物化学化学政治法学基因操作系统

作者

Mohammed Amine Bennouna,Dessislava A. Pachamanova,Georgia Perakis,Omar Skali Lami

出处

期刊：Management Science [Institute for Operations Research and the Management Sciences]
日期：2024-09-26

标识

DOI：10.1287/mnsc.2022.01652

摘要

This paper proposes a framework for learning the most concise Markov decision process (MDP) model of a continuous state-space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system’s observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values and, therefore, the optimal policy of the dynamic system—in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration. Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence, and we show that if the class of possible partitions is of finite Vapnik-Chervonenkis dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample probably approximately correct convergence guarantees. This insight motivates our minimal representation learning algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data. This paper was accepted by Chung Piaw Teo, optimization. Funding: The authors are very grateful to the Health Systems Initiative at MIT Sloan for financial support for this project. Supplemental Material: The online appendix is available at https://doi.org/10.1287/mnsc.2022.01652 .

求助该文献

最长约 10秒，即可获得该文献文件

Learning the Minimal Representation of a Continuous State-Space Markov Decision Process from Transition Data

今日热心研友