计算机科学
任务(项目管理)
弹道
实时计算
计算机网络
系统工程
工程类
物理
天文
作者
Zhen Gao,Jiaming Fu,Zongming Jing,Yu Dai,Lei Yang
标识
DOI:10.1109/jiot.2024.3362988
摘要
Existing joint trajectory planning and task offloading (JTPTO) methods provide ultra-low latency services for mobile devices (MDs) in unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC). However, UAVs typically provide services to MDs under partial observation, leading to challenges in achieving optimal service performance due to information loss. Moreover, the JTPTO problem typically involves multi-objective optimization, which is challenging because the objectives may conflict with each other. In this paper, we present a decentralized JTPTO method based on Multi-Objective and Independently Predicted Communication Multi-Agent Actor-Critic (MOIPCMAAC). First, an IPC network is designed to facilitate UAV agents in learning a prior for communication between UAVs. UAV agents learn this prior through causal reasoning, which represents the mapping of UAV's observation to the level of confidence in choosing communication partners. The effect of one UAV on another UAV is predicted through the critic-network in multi-agent reinforcement learning (MARL) and measured to indicate the necessity of UAV-UAV communication. Further, we regularize JTPTO policies to more effectively utilize exchanged messages. Second, a generalized variant of the Bellman optimality operator with multiple objectives is applied to address the JTPTO problem. We use it to learn a single parameterized expression that encompasses all the best JTPTO policies across the space of preferences. Experiments show that compared to existing solutions, MOIPC-MAAC reduces system costs by 14.23%~19.56% and the communication cost to approximately 11.23%. Moreover, compared to training from scratch, MOIPC-MAAC accelerates the adaptation of new JTPTO tasks with unknown preferences by 13.12%.
科研通智能强力驱动
Strongly Powered by AbleSci AI