强化学习
计算机科学
人工智能
适应性学习
机器学习
模式识别(心理学)
作者
Zhenni Li,Jianhao Tang,Haoli Zhao,Ci Chen,Shengli Xie
标识
DOI:10.1109/taes.2023.3342794
摘要
Deep reinforcement learning (DRL) has been applied to satellite navigation and positioning applications. Its performance relies heavily on the function-approximation capability of deep neural networks. However, existing DRL models suffer from catastrophic interference, resulting in inaccurate function approximation. The sparse-coding-based DRL is an effective method to mitigating this interference, but existing methods involve the following two challenging issues: first, the value function estimation network suffers from instability problems with gradient backpropagation, including gradient explosion and gradient vanishing, second, existing methods are limited to using hand-crafted sparse regularizers that produce only static sparsity, which may be difficult to apply in various dynamic reinforcement learning (RL) environments. In this article, we propose a novel dictionary learning (DL)-structured RL model with adaptive-sparsity regularizer (ASR) that alleviates the catastrophic interference and enables accurate value function approximation, thereby improving the RL performance. To alleviate the interference and avoid the instability problems in RL, a feedforward DL-structured RL model is constructed to predict the value function without the need for gradient backpropagation. To learn data-driven sparse representations with adaptive sparsity, we propose to use the learnable sparse regularizer ASR in the model, where the key hyperparameters of ASR can be trained to be adaptive to variable RL environments. To optimize the model efficiently, the model parameters are first pretrained in the pretraining stage, with only the value weights used for value function approximation needing to be fine-tuned for actual RL applications in the control training stage. Our comparative experiments in benchmark environments demonstrate that the proposed method can outperform existing state-of-the-art sparse-coding-based RL algorithms. In terms of accumulated rewards (used to measure the quality of the learned policy), the improvement was over 63% in Cart Pole environment and nearly 10% for Puddle World. Furthermore, the proposed algorithm can maintain its relatively high performance in the presence of noise up to 20 dB.
科研通智能强力驱动
Strongly Powered by AbleSci AI