线性二次高斯控制
强化学习
控制理论(社会学)
鲁棒控制
高斯分布
二次方程
计算机科学
线性控制系统
高斯过程
数学
控制(管理)
线性系统
人工智能
数学优化
控制系统
工程类
数学分析
物理
电气工程
量子力学
几何学
作者
Leilei Cui,Tamer Başar,Zhong‐Ping Jiang
标识
DOI:10.1109/tac.2024.3397928
摘要
This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI