Prioritized Experience Replay based on Multi-armed Bandit

计算机科学 强化学习 水准点(测量) 机器学习 重新使用 基线(sea) 人工智能 汤普森抽样 采样(信号处理) 样品(材料) 后悔 滤波器(信号处理) 地理 化学 地质学 海洋学 生物 色谱法 计算机视觉 生态学 大地测量学
作者
Ximing Liu,Tianqing Zhu,Cuiqing Jiang,Dayong Ye,Fuqing Zhao
出处
期刊:Expert Systems With Applications [Elsevier BV]
卷期号:189: 116023-116023 被引量:32
标识
DOI:10.1016/j.eswa.2021.116023
摘要

Experience replay has been widely used in deep reinforcement learning. The learning algorithm allows online reinforcement learning agents to remember and reuse experiences from the past. In order to further improve the sampling efficiency for experience replay, the most useful experiences are expected to be sampled with higher frequency. Existing methods usually designed their sampling strategy according to a few criteria, but they tended to combine different criteria in a linear or fixed manner, where the strategy were static and independent of the agent learner. This ignores the dynamic attribute of the environment and thus can only lead to a suboptimal performance. In this work, we propose a dynamic experience replay strategy according to the interaction between the agent and environment, which is called Prioritized Experience Replay based on Multi-armed Bandit (PERMAB). PERMAB can adaptively combine multiple priority criteria to measure the importance of the experience. In particular, the weight of each assessing criterion can be adaptively adjusted from episode to episode according to their respective contribution to the agent performance, which guarantees useful criterion to be weighted more in its current state. The proposed replay strategy is able to take both sample informativeness and diversity into consideration, which could significantly boosts learning ability and speed of the game agent. Experimental results show that PERMAB accelerates the network learning and achieves a better performance compared to baseline algorithms on seven benchmark environments with various difficulties.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
听见完成签到,获得积分10
1秒前
科目三应助能干的人采纳,获得30
2秒前
搞怪的水彤完成签到,获得积分10
4秒前
LZN完成签到,获得积分10
4秒前
张建煌发布了新的文献求助10
4秒前
大模型应助ri_290采纳,获得10
4秒前
5秒前
Nuyoah完成签到,获得积分10
6秒前
7秒前
8秒前
东都哈士奇完成签到,获得积分10
9秒前
充电宝应助卷毛维安采纳,获得10
9秒前
陈凯发布了新的文献求助10
10秒前
11秒前
krain发布了新的文献求助10
12秒前
骞岳发布了新的文献求助10
15秒前
15秒前
15秒前
危机的傲易完成签到,获得积分10
16秒前
tt完成签到,获得积分10
17秒前
wanyj发布了新的文献求助10
18秒前
20秒前
xc完成签到,获得积分10
21秒前
23秒前
24秒前
24秒前
24秒前
LZN发布了新的文献求助10
25秒前
MM发布了新的文献求助10
27秒前
28秒前
Eii发布了新的文献求助10
29秒前
prince666发布了新的文献求助10
29秒前
juicy香菜发布了新的文献求助10
29秒前
30秒前
BBB发布了新的文献求助10
30秒前
无恙完成签到,获得积分10
30秒前
一枚学术渣渣完成签到,获得积分10
35秒前
35秒前
36秒前
慕青应助云杉木采纳,获得10
36秒前
高分求助中
Principles of Economics, 11th Edition 10000
Prescott's Microbiology: 2026 Release ISE 10000
University Physics with Modern Physics, 16th edition 10000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Environmental Leverage in Times of Climate Crisis: Product Standards, Carbon Border Measures and Preferential Trade Agreements 1000
Interactions of Vowel Quality and Prosody in East Slavic 1000
Erwählung und Berufung bei Paulus: Bedeutung, Entwicklung und Funktion einer Vorstellung in ihrem frühjüdischen und griechisch-römischen Kontext 850
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 内科学 物理 复合材料 催化作用 细胞生物学 无机化学 光电子学 物理化学 电极 基因
热门帖子
关注 科研通微信公众号,转发送积分 7197952
求助须知:如何正确求助?哪些是违规求助? 8832957
关于积分的说明 18647368
捐赠科研通 6837455
什么是DOI,文献DOI怎么找? 3177685
关于科研通互助平台的介绍 2332041
邀请新用户注册赠送积分活动 2152233