Cooperative Multiagent Learning and Exploration With Min–Max Intrinsic Motivation

内在动机 计算机科学 知识管理 心理学 社会心理学
作者
Yaqing Hou,Jiarui Kang,Haiyin Piao,Yifeng Zeng,Yew-Soon Ong,Yaochu Jin,Qiang Zhang
出处
期刊:IEEE transactions on cybernetics [Institute of Electrical and Electronics Engineers]
卷期号:55 (6): 2852-2864 被引量:3
标识
DOI:10.1109/tcyb.2025.3557694
摘要

In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents' local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents' behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
2秒前
畅chang发布了新的文献求助10
2秒前
网络小卡卡发布了新的文献求助100
3秒前
科研通AI6.2应助叶迎采纳,获得10
3秒前
yy32323发布了新的文献求助10
3秒前
桐桐应助纵歌采纳,获得10
3秒前
Suniex发布了新的文献求助10
4秒前
4秒前
热情的戾发布了新的文献求助10
5秒前
科研通AI2S应助阳光热狗采纳,获得10
6秒前
6秒前
7秒前
8秒前
9秒前
mm发布了新的文献求助10
10秒前
BZ176发布了新的文献求助10
10秒前
赘婿应助赞zan采纳,获得10
11秒前
壮观缘分发布了新的文献求助10
12秒前
12秒前
123发布了新的文献求助10
12秒前
Suniex完成签到,获得积分10
12秒前
12秒前
起司猫发布了新的文献求助10
12秒前
Joanna完成签到 ,获得积分10
12秒前
15秒前
gxx完成签到,获得积分10
16秒前
Ava应助壮观缘分采纳,获得10
16秒前
17秒前
哈哈发布了新的文献求助30
18秒前
逆水行舟发布了新的文献求助10
19秒前
19秒前
星辰大海应助mm采纳,获得10
19秒前
19秒前
xiaobai发布了新的文献求助10
19秒前
隐形曼青应助冷静的路人采纳,获得10
22秒前
Ulysses发布了新的文献求助30
23秒前
李爱国应助xiaobai采纳,获得10
24秒前
受伤犀牛发布了新的文献求助10
24秒前
xjcy应助大力大神采纳,获得10
25秒前
高分求助中
Principles of Economics, 11th Edition 10000
Prescott's Microbiology: 2026 Release ISE 10000
University Physics with Modern Physics, 16th edition 10000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Environmental Leverage in Times of Climate Crisis: Product Standards, Carbon Border Measures and Preferential Trade Agreements 1000
Interactions of Vowel Quality and Prosody in East Slavic 1000
Erwählung und Berufung bei Paulus: Bedeutung, Entwicklung und Funktion einer Vorstellung in ihrem frühjüdischen und griechisch-römischen Kontext 850
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 内科学 物理 复合材料 催化作用 细胞生物学 无机化学 光电子学 物理化学 电极 基因
热门帖子
关注 科研通微信公众号,转发送积分 7193056
求助须知:如何正确求助?哪些是违规求助? 8829339
关于积分的说明 18641501
捐赠科研通 6828947
什么是DOI,文献DOI怎么找? 3175970
关于科研通互助平台的介绍 2328078
邀请新用户注册赠送积分活动 2150448