强化学习
计算机科学
人工神经网络
控制理论(社会学)
航天器
贝尔曼方程
背景(考古学)
最优控制
理论(学习稳定性)
功能(生物学)
单调函数
控制(管理)
数学优化
人工智能
机器学习
数学
工程类
进化生物学
生物
数学分析
航空航天工程
古生物学
作者
Yuhan Liu,Guangfu Ma,Yueyong Lyu,Pengyu Wang
标识
DOI:10.1016/j.neucom.2021.07.099
摘要
This paper proposes a novel reinforcement learning-based attitude tracking control strategy for combined spacecraft takeover maneuvers with completely unknown dynamics. One major issue in the context of combined spacecraft attitude takeover control is that the accurate dynamic model is highly nonlinear, complex and costly to identify online, which makes it impractical for control design. To address this issue, we take the advantage of the Q-learning algorithm to acquire the control strategy directly from system input/output measurement data in a model-free manner, and thus the online inertia parameter identification procedure is avoided. More specifically, first, the attitude tracking is formulated as a regulation problem by introducing an argumented system, where the system dynamic model is still required in control design. Then, in order to achieve a model-free control strategy, an online policy-iteration (PI) Q-learning procedure is derived to solve the Bellman optimality equation by utilizing the generated measurement data. In theoretical analysis, it is proved that the iteration sequences of Q value function and control strategy can converge to the optimal ones. In addition, rigorous proof of the stability and monotonicity guarantees of the proposed control strategy are also provided. Furthermore, for the purpose of online implementation, off-policy learning scheme is employed to find the optimal Q value function approximator with neural network structure after data-collection phase. Numerical simulations are exhibited to validate the effectiveness of the proposed strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI