强化学习
计算机科学
共识
多智能体系统
人工智能
作者
Yifan Hu,Junjie Fu,Guanghui Wen,Yuezu Lv,Wei Ren
出处
期刊:Automatica
[Elsevier BV]
日期:2024-04-04
卷期号:164: 111652-111652
被引量:3
标识
DOI:10.1016/j.automatica.2024.111652
摘要
Sample efficiency is a limiting factor for existing distributed multi-agent reinforcement learning (MARL) algorithms over networked multi-agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy-regularized MARL problem is formulated under the model of networked multi-agent Markov decision processes with observation-based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on-policy distributed actor–critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off-policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi-agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments.
科研通智能强力驱动
Strongly Powered by AbleSci AI