强化学习
趋同(经济学)
计算机科学
符号
功能(生物学)
因式分解
残余物
国家(计算机科学)
理论(学习稳定性)
航程(航空)
价值(数学)
人工智能
贝尔曼方程
人工神经网络
数学优化
数学
机器学习
算法
算术
工程类
航空航天工程
经济
生物
进化生物学
经济增长
作者
Rafael Pina,De Silva,Joosep Hook,A.M. Kondoz
标识
DOI:10.1109/tnnls.2022.3183865
摘要
Multiagent reinforcement learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multiagent setting can be very difficult as the number of agents increases. Recent solutions such as value decomposition networks (VDNs), QMIX, QTRAN, and QPLEX adhere to the centralized training and decentralized execution (CTDE) scheme and perform factorization of the joint action-value functions. However, these methods still suffer from increased environmental complexity, and at times fail to converge in a stable manner. We propose a novel concept of residual Q-networks (RQNs) for MARL, which learns to transform the individual $Q$ -value trajectories in a way that preserves the individual-global-max (IGM) criteria, but is more robust in factorizing action-value functions. The RQN acts as an auxiliary network that accelerates convergence and will become obsolete as the agents reach the training objectives. The performance of the proposed method is compared against several state-of-the-art techniques such as QPLEX, QMIX, QTRAN, and VDN, in a range of multiagent cooperative tasks. The results illustrate that the proposed method, in general, converges faster, with increased stability, and shows robust performance in a wider family of environments. The improvements in results are more prominent in environments with severe punishments for noncooperative behaviors and especially in the absence of complete state information during training time.
科研通智能强力驱动
Strongly Powered by AbleSci AI