强化学习
计算机科学
杠杆(统计)
水准点(测量)
代理(统计)
人工智能
多智能体系统
数学优化
机器学习
数学
大地测量学
地理
作者
Boli Fang,Zhenghao Peng,Hao Sun,Qin Zhang
标识
DOI:10.1109/ijcnn55064.2022.9892004
摘要
In this paper we propose Multi-Agent Proxy Proximal Policy Optimization (MA3PO), a novel multi-agent deep reinforcement learning algorithm that tackles the challenge of cooperative continuous multi-agent control. Our method is driven by the observation that most existing multi-agent reinforcement learning algorithms mainly focus on discrete state/action spaces and are thus computationally infeasible when extended to environments with continuous state/action spaces. To address the issue of computational complexity and to better model intra-agent collaboration, we make use of the recently successful Proximal Policy Optimization algorithm that effectively explores of continuous action spaces, and incorporate the notion of intrinsic motivation via meta-gradient methods so as to stimulate the behavior of individual agents in cooperative multi-agent settings. Towards these ends, we design proxy rewards to quantify the effect of individual agent-level intrinsic motivation onto the team-level reward, and apply meta-gradient methods to leverage such an addition so that our algorithm can learn the team-level cumulative reward effectively. Experiments on various multi-agent reinforcement learning benchmark environments with continuous action spaces demonstrate that our algorithm is not only comparable with the existing state-of-the-art benchmarks, but also significantly reduces training time complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI