计算机科学
强化学习
加权
动作(物理)
人工智能
集合(抽象数据类型)
多智能体系统
非线性系统
分布式计算
程序设计语言
医学
物理
量子力学
放射科
作者
Jiajun Chai,Weifan Li,Yuanheng Zhu,Dongbin Zhao,Zhe Ma,Kewu Sun,Jishiyu Ding
标识
DOI:10.1109/tnnls.2021.3105869
摘要
Multiagent reinforcement learning methods, such as VDN, QMIX, and QTRAN, that adopt centralized training with decentralized execution (CTDE) framework have shown promising results in cooperation and competition. However, in some multiagent scenarios, the number of agents and the size of the action set actually vary over time. We call these unshaped scenarios, and the methods mentioned above fail in performing satisfyingly. In this article, we propose a new method, called Unshaped Networks for Multiagent Systems (UNMAS), which adapts to the number and size changes in multiagent systems. We propose the self-weighting mixing network to factorize the joint action-value. Its adaption to the change in agent number is attributed to the nonlinear mapping from each-agent Q value to the joint action-value with individual weights. Besides, in order to address the change in an action set, each agent constructs an individual action-value network that is composed of two streams to evaluate the constant environment-oriented subset and the varying unit-oriented subset. We evaluate UNMAS on various StarCraft II micromanagement scenarios and compare the results with several state-of-the-art MARL algorithms. The superiority of UNMAS is demonstrated by its highest winning rates especially on the most difficult scenario 3s5z_vs_3s6z. The agents learn to perform effectively cooperative behaviors, while other MARL algorithms fail. Animated demonstrations and source code are provided in https://sites.google.com/view/unmas.
科研通智能强力驱动
Strongly Powered by AbleSci AI