植绒(纹理)
强化学习
甲骨文公司
马尔可夫决策过程
计算机科学
可扩展性
人工智能
群体行为
灵活性(工程)
分布式计算
机器学习
马尔可夫过程
软件工程
数据库
统计
复合材料
材料科学
数学
作者
Wen Wang,Liang Wang,Junfeng Wu,Xianping Tao,Haijun Wu
出处
期刊:IEEE Transactions on Vehicular Technology
[Institute of Electrical and Electronics Engineers]
日期:2022-06-20
卷期号:71 (10): 10280-10292
被引量:23
标识
DOI:10.1109/tvt.2022.3184043
摘要
The flocking and navigation control of large-scale Unmanned Aerial Vehicle (UAV) swarms have received a lot of research interest due to the wide applications of UAVs in many fields. Compared to traditional non-learning-based flocking and navigation control methods, reinforcement learning-based methods have advantages in model-free, flexibility, and adaptability. In this paper, we formulate the flocking and navigation control of the UAV swarm as a Markov Decision Process (MDP) and use multi-agent reinforcement learning methods to solve the problem. There are two significant challenges introduced by reinforcement learning: the scalability issue and the partial observations of each UAV. We adopt the independent learning and parameter sharing scheme to tackle the scalability issue, which extends the single-agent reinforcement learning algorithms to the multi-agent scenario. For the partial observations, we propose an oracle-guided two-stage training and execution scheme, which utilizes the flock center during the training phase but avoids the dependence on the flock center during the execution phase. We design the oracle-guided observations and rewards and build a highly efficient simulation environment to conduct experiments. Simulation results show that the policy trained with our method performs well with up to thirty-two UAVs and outperforms the policy trained with local observations.
科研通智能强力驱动
Strongly Powered by AbleSci AI