马尔可夫决策过程
模型预测控制
杠杆(统计)
数学优化
理论(学习稳定性)
背景(考古学)
约束(计算机辅助设计)
强化学习
计算机科学
马尔可夫过程
马尔可夫链
控制理论(社会学)
数学
控制(管理)
人工智能
机器学习
古生物学
统计
生物
几何学
作者
Mario Zanon,Sebastien Gros,Michele Palladino
出处
期刊:Automatica
[Elsevier]
日期:2021-02-02
卷期号:143: 110399-110399
被引量:1
标识
DOI:10.1016/j.automatica.2022.110399
摘要
In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise.
科研通智能强力驱动
Strongly Powered by AbleSci AI