马尔可夫决策过程
强化学习
计算机科学
过程(计算)
再入
人工神经网络
马尔可夫过程
人工智能
模拟
控制理论(社会学)
控制(管理)
数学
医学
统计
操作系统
心脏病学
作者
Qingji Jiang,Xiaogang Wang,Yu Li
出处
期刊:Mechanisms and machine science
日期:2023-12-05
卷期号:: 291-313
标识
DOI:10.1007/978-3-031-42515-8_20
摘要
Aimed at avoiding multiple dynamic no-fly zones and satisfying path constraints and terminal constraints in the reentry process of hypersonic glide vehicles, intelligent reentry guidance based on deep reinforcement learning is developed. Firstly, the guidance is decoupled as longitudinal guidance and lateral guidance. The lateral guidance provides the sign of the bank angle to adjust the heading direction while the longitudinal guidance outputs the magnitude of the bank angle through the artificial intelligence interface. Then, the reentry guidance simulation is mapped to a Markov Decision Process, in which the essential elements including state, action, and reward are defined or designed adaptively. Finally, the policy neural network is trained by the twin delayed deep deterministic policy gradient (TD3) algorithm. By selecting proper hyperparameters and network architecture, the policy neural network is able to converge. Simulations imply that under the influence of dynamic no-fly zones, initial state errors, and kinds of online dispersion, the proposed guidance can avoid all the no-fly zones and reach the target accurately with all the satisfied path constraints.
科研通智能强力驱动
Strongly Powered by AbleSci AI