自动驾驶仪
固定翼
强化学习
控制理论(社会学)
计算机科学
控制器(灌溉)
翼
控制工程
人工智能
工程类
航空航天工程
控制(管理)
生物
农学
作者
Lun Li,Xuebo Zhang,Chenxu Qian,Runhua Wang,Minghui Zhao
标识
DOI:10.1109/tetci.2024.3360322
摘要
In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI