强化学习
计算机科学
维数之咒
样品(材料)
地点
人工智能
集合(抽象数据类型)
采样(信号处理)
学习迁移
机器学习
探测器
电信
程序设计语言
色谱法
化学
语言学
哲学
作者
Zhenkun Gao,Xiaoyan Dai,Meibao Yao,Xueming Xiao
标识
DOI:10.1109/icps58381.2023.10128078
摘要
Cooperative hunting is a typical and significant scene to study multi-agent behaviors, where conventional control strategies are difficult to cope with, due to its high dimensionality of state space and locality of communication. Reinforcement learning provides a framework and a set of tools for this issue by trial-and-error interactions with the environment. Though promising, it often requires a large number of empirical sample data to learn effective hunting strategies, leading to low sample efficiency, understood as the training episodes required for the agent to learn effective behavior strategies. To improve the sampling efficiency, we propose a data enhancement strategy integrated in the execution (CTDE) training framework to train the multi-agent system. The data enhancement strategy is based on a state transfer dynamics model to generate additional predicted data, which we called dynamic prediction model, combined with the empirical data by interacting with the environment, for higher sample efficiency. The simulation results on the Webots platform show that our method outperforms some state-of-the-art methods, such as MAPPO, with high data sample efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI