A leader-following paradigm based deep reinforcement learning method for multi-agent cooperation games

强化学习 计算机科学 动作选择 动作(物理) 人工智能 功能(生物学) 选择(遗传算法) 熵(时间箭头) 心理学 感知 量子力学 进化生物学 生物 物理 神经科学
作者
Feiye Zhang,Qingyu Yang,Dou An
出处
期刊:Neural Networks [Elsevier]
卷期号:156: 1-12 被引量:1
标识
DOI:10.1016/j.neunet.2022.09.012
摘要

Multi-agent deep reinforcement learning algorithms with centralized training with decentralized execution (CTDE) paradigm has attracted growing attention in both industry and research community. However, the existing CTDE methods follow the action selection paradigm that all agents choose actions at the same time, which ignores the heterogeneous roles of different agents. Motivated by the human wisdom in cooperative behaviors, we present a novel leader-following paradigm based deep multi-agent cooperation method (LFMCO) for multi-agent cooperative games. Specifically, we define a leader as someone who broadcasts a message representing the selected action to all subordinates. After that, the followers choose their individual action based on the received message from the leader. To measure the influence of leader's action on followers, we introduced a concept of information gain, i.e., the change of followers' value function entropy, which is positively correlated with the influence of leader's action. We evaluate the LFMCO on several cooperation scenarios of StarCraft2. Simulation results confirm the significant performance improvements of LFMCO compared with four state-of-the-art benchmarks on the challenging cooperative environment.

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
大个应助Nike采纳,获得10
刚刚
852应助Nike采纳,获得10
刚刚
李健应助Nike采纳,获得10
1秒前
英俊的铭应助Nike采纳,获得10
1秒前
Jasper应助Nike采纳,获得10
1秒前
嘿嘿应助Nike采纳,获得10
1秒前
领导范儿应助科研通管家采纳,获得10
4秒前
烟花应助科研通管家采纳,获得10
4秒前
4秒前
4秒前
4秒前
charint应助科研通管家采纳,获得40
4秒前
4秒前
领导范儿应助科研通管家采纳,获得10
4秒前
4秒前
4秒前
烟花应助科研通管家采纳,获得10
4秒前
4秒前
4秒前
charint应助科研通管家采纳,获得40
4秒前
4秒前
深情安青应助科研通管家采纳,获得10
4秒前
4秒前
4秒前
4秒前
Ava应助科研通管家采纳,获得10
4秒前
开放鸿涛应助科研通管家采纳,获得10
4秒前
ding应助科研通管家采纳,获得10
4秒前
NexusExplorer应助科研通管家采纳,获得10
4秒前
脑洞疼应助虚幻怜珊采纳,获得10
4秒前
Lucas应助科研通管家采纳,获得10
4秒前
蓝天应助科研通管家采纳,获得10
4秒前
开放鸿涛应助科研通管家采纳,获得10
4秒前
4秒前
xu完成签到,获得积分10
5秒前
惠若烟完成签到,获得积分10
5秒前
orixero应助natus采纳,获得10
7秒前
隐形曼青应助552497采纳,获得10
7秒前
aaa发布了新的文献求助10
10秒前
专注诗珊完成签到,获得积分20
11秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Operational Bulk Evaporation Duct Model for MORIAH Version 1.2 1200
Variants in Economic Theory 1000
Signals, Systems, and Signal Processing 880
Yangtze Reminiscences. Some Notes And Recollections Of Service With The China Navigation Company Ltd., 1925-1939 800
Discrete-Time Signals and Systems 510
Clinical Efficacy of the Hydrogel Patch Containing Loxoprofen Sodium (LX-A) on Osteoarthritis of the Knee-A Randomized, Open Label Clinical Study with Ketoprofen Patch-(Phase III Therapeutic Confirmatory Study) 410
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5842886
求助须知:如何正确求助?哪些是违规求助? 6177333
关于积分的说明 15610592
捐赠科研通 4960046
什么是DOI,文献DOI怎么找? 2674058
邀请新用户注册赠送积分活动 1618936
关于科研通互助平台的介绍 1574164