植绒(纹理)
强化学习
计算机科学
多智能体系统
样品(材料)
钢筋
人工智能
分布式计算
工程类
复合材料
材料科学
化学
结构工程
色谱法
作者
Yunbo Qiu,Yue Jin,Lebin Yu,Jian Wang,Yu Wang,Xudong Zhang
标识
DOI:10.1109/jiot.2023.3240671
摘要
Control algorithms of a multiagent system (MAS) have been applied to many Internet of Things devices, such as unmanned aerial vehicles and autonomous underwater vehicles. Flocking control is a crucial problem in MAS to enhance the safety and cooperativity of agents, which requires the agents to maintain the flock when navigating to a target position and avoiding collisions. In comparison with the traditional algorithms, methods based on multiagent reinforcement learning (MARL) can solve the problem of flocking control more flexibly and adapt to more complex environments. However, the MARL-based methods demand a huge number of interactions between agents and the environment, resulting in the problem of sample inefficiency. In this article, we propose nonexpert policy-aided MARL (NPA-MARL) to improve sample efficiency, which utilizes a fundamental MARL algorithm and a prior policy whose performance can be nonexpert. Before online MARL training, NPA-MARL generates demonstrations by the nonexpert policy to pretrain agents, while preventing overfitting demonstrations. During online training, NPA-MARL instructs agents to imitate the nonexpert policy if the nonexpert policy is better in agents' recognition. We leverage NPA-MARL to solve the problem of flocking control. Experimental results show that NPA-MARL improves sample efficiency and policy performance in flocking control. Besides, NPA-MARL has the scalability of more agents and the flexibility of choice of the nonexpert policy and a fundamental MARL algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI