强化学习
理论(学习稳定性)
计算机科学
过程(计算)
人工智能
机器学习
操作系统
作者
Jiao Wu,Rui Wang,Ruiying Li,Hui Zhang,Xiaohui Hu
标识
DOI:10.1109/smc.2018.00039
摘要
The remarkable Deep Deterministic Policy Gradient (DDPG) reinforcement learning method commonly consists of actor learning and critic learning. The actor learning highly relies on the critic learning, which makes the performance of DDPG method rather sensitive to critic learning and leads to stability issues. To further improve the stability and performance of DDPG method, the multi-critic DDPG method (MCDDPG) is proposed for a reliable critic learning. The average value of multiple critics is used to replace the single critic in DDPG method for better resistance when one critic performs badly, and multiple independent critics can learn knowledges from environment more widely. Besides, an extension of experience replay mechanism is revealed for accelerating the training process. All the methods are tested on simulated environments in OpenAI Gym platform, and convincing experiment results are obtained to support the proposed methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI