计算机科学
强化学习
调度(生产过程)
可扩展性
分布式计算
杠杆(统计)
冗余(工程)
人工智能
数学优化
操作系统
数据库
数学
作者
Xuanhan Zhou,Jun Xiong,Haitao Zhao,Chao Yan,Jibo Wei
标识
DOI:10.1109/tmc.2024.3437679
摘要
Unmanned aerial vehicles (UAVs) as mobile base stations are recognized as effective means for emergency communications. The performance of such systems depends on the movement of UAVs and scheduling of ground users (GUs). However, devising an efficient algorithm to jointly optimize UAV trajectories and user scheduling is still challenging, especially in real-time scenarios lacking central controllers. Multi-agent deep reinforcement learning (MADRL) provides a promising solution to this problem. Nevertheless, as the numbers of UAVs and GUs increase, existing MADRL algorithms encounter scalability and sample efficiency issues. In this paper, we develop a novel symmetry-augmented MADRL approach for learning scalable UAV trajectory design and user scheduling policies. The core idea is to utilize symmetries to reduce the multi-agent state-action space and enhance sample efficiency. Specifically, we design a family of neural networks to learn individual policies, namely entity permutation equivariant policy networks (EP2Nets). EP2Nets effectively leverage the permutation symmetry to reduce redundancy in the state-action space. Additionally, we achieve data augmentation by exploiting rotational and reflection symmetries, further boosting sample efficiency. Finally, a Symmetric QMIX (SymmQMIX) algorithm is proposed by integrating the EP2Net and data augmentation method into the QMIX algorithm. Simulation results indicate that SymmQMIX significantly outperforms QMIX and other symmetry-enhanced algorithms, achieving a 4.5-fold increase in converged performance and a 100-fold improvement in sample efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI