Multi actor hierarchical attention critic with RNN-based feature extraction

计算机科学人工智能可扩展性注意力网络编码动作（物理）序列（生物学）过程（计算）特征（语言学）人工神经网络任务（项目管理）强化学习机器学习代表（政治）操作系统基因经济法学政治学生物政治数据库量子力学哲学管理语言学遗传学物理化学生物化学

作者

Dianxi Shi,Chenran Zhao,Yajie Wang,Huanhuan Yang,Gongju Wang,Hao Jiang,Chao Xue,Shaowu Yang,Yongjun Zhang

出处

期刊：Neurocomputing [Elsevier BV]
日期：2021-11-02 卷期号：471: 79-93 被引量：9

链接

sciencedirect.comdoi.org

标识

DOI：10.1016/j.neucom.2021.10.093

摘要

Abstract Deep reinforcement learning has made significant progress in multi-agent tasks in recent years. However, most previous studies focus on solving fully cooperative tasks, which do not perform well in mixed tasks. In mixed tasks, the agent needs to comprehensively consider the information provided by its friends and enemies to learn its strategy, and its strategy is sensitive to the received information. Additionally, the input space of the critic network increases rapidly with the number of agents in the actor-critic framework. It’s of great necessity to efficiently learn information representation to obtain important features. To this end, we present an approach that conducts information representation with attention mechanism. Our approach adopts the framework of centralized training and decentralized execution. We apply the multi-head hierarchical attention mechanism to centrally computed critics, so critics can process the received information more accurately and assist actors in choosing better actions. The hierarchical attention critic adopts a bi-level attention structure which is composed of the agent-level and the group-level. They are designed to assign different weights to friends’ and enemies’ information and then summarize them at each timestep. It achieves high efficiency and scalability in mixed tasks. Furthermore, we use the feature extraction based on the recurrent neural network to encode the state-action sequence information of each agent. Experimental results show that our approach is not only applicable to cooperative environments but also better in mixed environments, especially in the predator-prey task, the reward obtained by our method is twice that of the baselines.

求助该文献

最长约 10秒，即可获得该文献文件

Multi actor hierarchical attention critic with RNN-based feature extraction

今日热心研友