强化学习
计算机科学
模块化设计
软件部署
趋同(经济学)
人工智能
理论(学习稳定性)
激励
分布式计算
资源配置
资源(消歧)
分散系统
共享资源
自适应系统
多智能体系统
机器学习
培训(气象学)
资源管理(计算)
协议(科学)
同伴学习
标识
DOI:10.1142/s021800142551022x
摘要
This paper proposes a modular multi-agent reinforcement learning framework that integrates Centralized Training with Decentralized Execution (CTDE), attention-based communication, and adaptive reward shaping. Built upon an extended Soft Actor–Critic algorithm, the system enables decentralized agents to learn robust policies under partial observability. A shared critic computes value estimates using the full global state, while decentralized actors use selectively aggregated peer messages via a task-driven attention mechanism. Adaptive reward shaping dynamically aligns agent incentives with global objectives, accelerating convergence. The system is evaluated on three benchmarks: Multi-Agent Particle Environment (MPE), StarCraft II Micromanagement Challenge (SMAC), and a custom Resource Allocation Simulator (RAS). Compared to MAPPO, MADDPG, and ISAC baselines, our method improves average episodic reward by 15–25%, reduces convergence steps by up to 40%, and enhances coordination scores significantly. Results also show superior stability across random seeds and reduced wall-clock training time, highlighting the method’s effectiveness for real-world deployment in dynamic multi-agent settings.
科研通智能强力驱动
Strongly Powered by AbleSci AI