A deep multi-agent reinforcement learning framework for autonomous aerial navigation to grasping points on loads

强化学习计算机科学人工智能杠杆（统计）弹道机器人规划师可扩展性点（几何）深度学习模拟数学天文物理数据库几何学

作者

Jingyu Chen,Ruidong Ma,John Oyekan

出处

期刊：Robotics and Autonomous Systems [Elsevier]
日期：2023-07-10 卷期号：167: 104489-104489 被引量：14

标识

DOI：10.1016/j.robot.2023.104489

摘要

Deep reinforcement learning, by taking advantage of neural networks, has made great strides in the continuous control of robots. However, in scenarios where multiple robots are required to collaborate with each other to accomplish a task, it is still challenging to build an efficient and scalable multi-agent control system due to increasing complexity. In this paper, we regard each unmanned aerial vehicle (UAV) with its manipulator as one agent, and leverage the power of multi-agent deep deterministic policy gradient (MADDPG) for the cooperative navigation and manipulation of a load. We propose solutions for addressing navigation to grasping point problem in targeted and flexible scenarios, and mainly focus on how to develop model-free policies for the UAVs without relying on a trajectory planner. To overcome the challenges of learning in scenarios with an increasing number of grasping points, we incorporate the demonstrations from an Optimal Reciprocal Collision Avoidance (ORCA) algorithm into our framework to guide the policy training and adapt two novel techniques into the architecture of MADDPG. Furthermore, curriculum learning with the attention mechanism is utilized by reusing knowledge from fewer grasping points to facilitate the training of a load with more points. Our experiments were validated by a load with three, four and six grasping points respectively in Coppeliasim simulator and then transferred into the real world with Crazyflie quadrotors. Our results show that the average tracking deviations from the desirable grasping point to the final position of the UAV can be less than 10 cm in some real-world experiments. Compared with state-of-the-art model-free reinforcement learning and swarm optimisation algorithms, results show that our proposed methods outperform other baselines with a reasonable success rate especially in the scenarios with more grasping points. Furthermore, the learned optimal policies enable UAVs to reach and hover over all the grasping points before manipulation without any collision. We conducted a comprehensive analysis of both targeted and flexible navigation, highlighting their respective advantages and disadvantages.

求助该文献

A deep multi-agent reinforcement learning framework for autonomous aerial navigation to grasping points on loads

今日热心研友