计算机科学
最大化
计算机网络
分布式计算
数学优化
数学
作者
Saugat Tripathi,Ran Zhang,Miao Wang
标识
DOI:10.1109/globecom54140.2023.10436937
摘要
Multi-agent reinforcement learning has been applied to Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) to effectively solve the problem of time-coupled sequential decision making while achieving scalability. Nevertheless, a transverse comparison on the impact of different levels of inter-agent information exchange on the learning convergence has not been well studied. In this work, we study a distributed user connectivity maximization problem in a UCN, aiming to obtain a trajectory design to optimally guide UAVs' movements in a time horizon to maximize the accumulated number of connected users. Specifically, the problem is first formulated into a time- coupled mixed-integer non-convex optimization problem. A two- stage user association policy is proposed to determine the UAV- user connectivity. A multi-agent deep Q learning algorithm is then designed to solve the optimization, featuring four different levels of information exchange and reward function design. Simulations are conducted to compare the convergence speed and total number of connected users per episode between different levels. The results show that exchanging state information with a deliberated task-specific reward function design yields the best convergence performance in both cases of stationary and dynamic user distributions.
科研通智能强力驱动
Strongly Powered by AbleSci AI