强化学习
计算机科学
运动规划
分解
线路规划
路径(计算)
人工智能
数学优化
分布式计算
机器学习
机器人
生态学
数学
生物
程序设计语言
标识
DOI:10.1109/icmsp58539.2023.10170972
摘要
In a multi-agent environment, efficient collaborative search among multiple unmanned aerial vehicles (UAVs) is crucial for area search and path planning. Therefore, it is important to consider the collaborative learning among UAVs when designing collaborative strategies. This paper analyzes the problems of existing value-decomposition-based multi-agent reinforcement learning algorithms in multi-UAV area search and path planning, namely the underestimation of optimal joint actions, which leads to suboptimal policy generation. Subsequently, a new non-monotonicity value-decomposition algorithm is proposed based on the addition of a masked highway connection strategy. This algorithm suppresses the optimization of certain action pairs and focuses more on the optimal joint actions, thus better optimizing the objective and recovering the value of optimal joint actions. Multiple simulation experiments demonstrate that the proposed algorithm can achieve improved performance in environments involving UAV swarms and enhance the collaborative effectiveness of UAV swarms in search tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI