贝尔曼方程
马尔可夫决策过程
数学优化
最优控制
强化学习
计算机科学
代数Riccati方程
增强学习
趋同(经济学)
马尔可夫链
功能(生物学)
马尔可夫过程
控制理论(社会学)
数学
控制(管理)
Riccati方程
微分方程
进化生物学
生物
经济增长
统计
机器学习
数学分析
人工智能
经济
作者
Peixin Zhou,Jiwei Wen,Akshya Swain,Xiaoli Luan
标识
DOI:10.1177/09596518221116951
摘要
This article develops a model-free adaptive optimal control policy for discrete-time Markov jump systems. First, a two-player zero-sum game is formulated to obtain an optimal control policy that minimizes a cost function against the worst-case disturbance. Second, an action and mode-dependent value function is set up for zero-sum game to search such a policy with convergence guarantee rather than solving an optimization problem satisfying coupled algebraic Riccati equations. To be specific, motivated by the Bellman optimal principle, we develop an online value iterations algorithm to solve the zero-sum game, which is learning while controlling without any initial stabilizing policy. By this algorithm, we can achieve disturbance attenuation for Markov jump systems without knowledge of the system matrices. The adaptivity to slowly changing uncertainties can also be achieved due to the model-free feature and policy convergence. Finally, the effectiveness and practical potential of the algorithm are demonstrated by considering two numerical examples and a solar boiler system.
科研通智能强力驱动
Strongly Powered by AbleSci AI