强化学习
导弹
计算机科学
稳健性(进化)
导弹制导
梯度下降
人工神经网络
控制理论(社会学)
加速度
比例导航
收敛速度
制导系统
趋同(经济学)
贝尔曼方程
人工智能
数学优化
控制(管理)
工程类
数学
化学
物理
经济
航空航天工程
频道(广播)
基因
经典力学
生物化学
经济增长
计算机网络
作者
Zhe Min Hu,Liang Xiao,Jiancheng Guan,Wei-Jian Yi,Hongqiao Yin
摘要
In this paper, a novel guidance law based on a reinforcement learning (RL) algorithm is presented to deal with the maneuvering target interception problem using a deep deterministic policy gradient descent neural network. We take the missile’s line-of-sight (LOS) rate as the observation of the RL algorithm and propose a novel reward function, which is constructed with the miss distance and LOS rate to train the neural network off-line. In the guidance process, the trained neural network has the capacity of mapping the missile’s LOS rate to the normal acceleration of the missile directly, so as to generate guidance commands in real time. Under the actor-critic (AC) framework, we adopt the twin-delayed deep deterministic policy gradient (TD3) algorithm by taking the minimum value between a pair of critics to reduce overestimation. Simulation results show that the proposed TD3-based RL guidance law outperforms the current state of the RL guidance law, has better performance to cope with continuous action and state space, and also has a faster convergence speed and higher reward. Furthermore, the proposed RL guidance law has better accuracy and robustness when intercepting a maneuvering target, and the LOS rate is converged.
科研通智能强力驱动
Strongly Powered by AbleSci AI