计算机科学
碰撞
人工智能
分布式计算
实时计算
计算机安全
标识
DOI:10.21203/rs.3.rs-6490892/v1
摘要
Abstract Coordinating multiple unmanned aerial vehicles (UAVs) for inspection, delivery, and search-and-rescue missions demands routes that are globally efficient yet locally safe. Flat optimisation or single-level reinforcement-learning agents scale poorly as map size, obstacle density, or fleet size increase, because one policy must juggle long-horizon objectives and split-second collision avoidance. We reformu- late multi-UAV path planning as a hierarchical reinforcement-learning problem and introduce a two-tier controller for discrete grids under partial observability. A high-level manager selects coarse waypoints toward mission goals, while a shared recurrent worker—trained with proximal policy optimisation and an LSTM back- bone—executes short, collision-aware motion sequences. We prove that, given an expressive waypoint dictionary, every subgame-perfect equilibrium of the induced Markov game is collision-free and that enlarging the dictionary monotonically improves team return. To keep training practical we propose manager–worker curriculum optimisation: the worker is pre-trained on small grids and frozen, then the manager is trained on progressively larger maps. Experiments on three bench- marks—ranging from two to six UAVs with 20 %–40 % obstacle coverage—show that the hierarchy maintains ≥ 90 % mission success and reduces collisions by up to 74 % relative to plain PPO (62 % versus PPO + LSTM), while lengthening routes by no more than three primitive steps (≤ 2 compared with PPO + LSTM). Performance degrades only marginally as fleet size and obstacle density grow, confirming that a modest waypoint vocabulary combined with recurrent memory can turn simple reactive primitives into safe, scalable multi-UAV behaviour.
科研通智能强力驱动
Strongly Powered by AbleSci AI