强化学习
计算机科学
趋同(经济学)
任务(项目管理)
一般化
人工智能
算法
钥匙(锁)
领域(数学)
优化算法
功能(生物学)
机器学习
分布式计算
数学优化
工程类
数学
数学分析
计算机安全
系统工程
进化生物学
纯数学
经济
生物
经济增长
作者
Guang Zhan,Xinmiao Zhang,Zhongchao Li,Lin Xu,Deyun Zhou,Zhen Yang
出处
期刊:Drones
[Multidisciplinary Digital Publishing Institute]
日期:2022-07-04
卷期号:6 (7): 166-166
被引量:34
标识
DOI:10.3390/drones6070166
摘要
Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm’s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.
科研通智能强力驱动
Strongly Powered by AbleSci AI