强化学习
计算机科学
人工智能
杠杆(统计)
弹道
机器人
规划师
可扩展性
点(几何)
深度学习
模拟
数学
天文
物理
数据库
几何学
作者
Jingyu Chen,Ruidong Ma,John Oyekan
标识
DOI:10.1016/j.robot.2023.104489
摘要
Deep reinforcement learning, by taking advantage of neural networks, has made great strides in the continuous control of robots. However, in scenarios where multiple robots are required to collaborate with each other to accomplish a task, it is still challenging to build an efficient and scalable multi-agent control system due to increasing complexity. In this paper, we regard each unmanned aerial vehicle (UAV) with its manipulator as one agent, and leverage the power of multi-agent deep deterministic policy gradient (MADDPG) for the cooperative navigation and manipulation of a load. We propose solutions for addressing navigation to grasping point problem in targeted and flexible scenarios, and mainly focus on how to develop model-free policies for the UAVs without relying on a trajectory planner. To overcome the challenges of learning in scenarios with an increasing number of grasping points, we incorporate the demonstrations from an Optimal Reciprocal Collision Avoidance (ORCA) algorithm into our framework to guide the policy training and adapt two novel techniques into the architecture of MADDPG. Furthermore, curriculum learning with the attention mechanism is utilized by reusing knowledge from fewer grasping points to facilitate the training of a load with more points. Our experiments were validated by a load with three, four and six grasping points respectively in Coppeliasim simulator and then transferred into the real world with Crazyflie quadrotors. Our results show that the average tracking deviations from the desirable grasping point to the final position of the UAV can be less than 10 cm in some real-world experiments. Compared with state-of-the-art model-free reinforcement learning and swarm optimisation algorithms, results show that our proposed methods outperform other baselines with a reasonable success rate especially in the scenarios with more grasping points. Furthermore, the learned optimal policies enable UAVs to reach and hover over all the grasping points before manipulation without any collision. We conducted a comprehensive analysis of both targeted and flexible navigation, highlighting their respective advantages and disadvantages.
科研通智能强力驱动
Strongly Powered by AbleSci AI