计算机科学
一般化
理论(学习稳定性)
多智能体系统
干扰(通信)
人工智能
强化学习
机器学习
电信
数学
频道(广播)
数学分析
作者
Wei Pan,Nanding Wang,Chenxi Xu,Kao‐Shing Hwang
出处
期刊:IEEE Transactions on Cognitive and Developmental Systems
[Institute of Electrical and Electronics Engineers]
日期:2021-09-22
卷期号:14 (4): 1486-1495
被引量:8
标识
DOI:10.1109/tcds.2021.3110959
摘要
Multiagent reinforcement learning (RL) is widely used and can successfully solve many problems in the real world. In the multiagent RL system, a global critic network is used to guide each agent's strategy to update to learn the most beneficial strategy for the collective. However, the global critic network also makes the current agent's learning be affected by other agents' strategies, which leads to unstable learning. To solve this problem, we propose dynamic decomposed multiagent deep deterministic policy gradient (DD-MADDPG): a new network that considers both global and local evaluations and adaptively adjusts the agent's attention to the two evaluations. Besides, the use of the experience replay buffer by multiagent deep deterministic policy gradient (MADDPG) produces outdated experience, and the outdated strategies of other agents further affect the learning of the current agent. To reduce the influence of other agents' outdated experience, we propose TD-Error and Time-based experience sampling (T2-PER) based on DD-MADDPG. We evaluate the proposed algorithm's performance according to the learning stability and the average return obtained by the agents. We have conducted experiments in the MPE environment. The results show that the proposed method has better stability and higher learning efficiency than MADDPG and has a certain generalization ability.
科研通智能强力驱动
Strongly Powered by AbleSci AI