植绒(纹理)
强化学习
计算机科学
马尔可夫决策过程
运动学
部分可观测马尔可夫决策过程
碰撞
群体行为
人工智能
分布式计算
马尔可夫过程
马尔可夫链
机器学习
马尔可夫模型
数学
统计
材料科学
物理
计算机安全
经典力学
复合材料
作者
Mahsoo Salimi,Philippe Pasquier
标识
DOI:10.1109/icrae53653.2021.9657767
摘要
Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV flocking control problem is formulated as a partially observable Markov decision process (POMDP) and solved by deep reinforcing learning. In particular, we consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. The simulation results demonstrate that the trained optimal policy converges to flocking formation without parameter tuning and has good generalization ability for different UAVs.
科研通智能强力驱动
Strongly Powered by AbleSci AI