Balancing exploration and exploitation in episodic reinforcement learning

强化学习 计算机科学 激励 人工智能 再分配(选举) 风险分析(工程) 机器学习 业务 微观经济学 政治 政治学 法学 经济
作者
Qihang Chen,Qiwei Zhang,Yunlong Liu
出处
期刊:Expert Systems With Applications [Elsevier BV]
卷期号:231: 120801-120801 被引量:1
标识
DOI:10.1016/j.eswa.2023.120801
摘要

One of the major challenges in reinforcement learning (RL) is its applications in episodic tasks, such as chess game, molecular structure design, healthcare, among others, where the rewards in such scenarios are usually sparse and can only be obtained at the end of an episode. The challenges posed by such episodic RL tasks place stringent demands on the exploration and credit assignment capabilities of the agent. In the current literature, many techniques have been presented to address these two issues, for example, various exploration methods have been proposed to increase the exploration ability of the agents to obtain diverse experience samples, and for the delayed reward problem, reward redistribution methods have provided dense task-oriented guidance to the agents by reshaping the sparse and delayed environmental rewards with the assistance of the episodic feedback. Although some successes have been achieved, with current existing techniques, the agents are usually unable to quickly assign credits to the explored key transitions or the related methods are prone to be misled by behavioral policies that fall into local optima and lead to sluggish learning efficiency. To alleviate inefficient learning due to sparse and delayed rewards, we propose a guided reward approach, namely Exploratory Intrinsic with Mission Guidance Reward (EMR), which organically combines intrinsic rewards of exploration mechanisms with reward redistribution in RL to balance exploration and exploitation of RL agents in such tasks. By using entropy-based intrinsic incentives and a simple uniform reward redistribution method, EMR will enable an agent with both the strong exploration and exploitation capability to efficiently overcome challenging tasks with such sparse and delayed rewards. We evaluated and analyzed EMR on several tasks in the Deep Mind Control Suite benchmark, experimental results show that the EMR-equipped agent has faster learning efficiency and even better performance than those using the exploration bonus or the reward redistribution method alone.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
biancai发布了新的文献求助10
刚刚
lqq完成签到 ,获得积分10
刚刚
1秒前
liubo发布了新的文献求助10
1秒前
metabolic发布了新的文献求助10
1秒前
研友_VZG7GZ应助缓慢氧化采纳,获得10
1秒前
Ar关闭了Ar文献求助
1秒前
科研通AI6.2应助李程阳采纳,获得10
2秒前
笨笨鲜花完成签到,获得积分10
2秒前
3秒前
歪歪大王发布了新的文献求助10
4秒前
5秒前
liubo完成签到,获得积分10
6秒前
zc完成签到,获得积分10
6秒前
lmt2025发布了新的文献求助10
7秒前
Lumen完成签到 ,获得积分10
7秒前
maxiaole应助xiaopeng采纳,获得10
8秒前
ddwdwdwdddw完成签到,获得积分20
8秒前
万能图书馆应助科研王采纳,获得10
9秒前
10秒前
CodeCraft应助DD采纳,获得50
11秒前
12秒前
weitao发布了新的文献求助10
12秒前
ddwdwdwdddw发布了新的文献求助10
12秒前
13秒前
linonil完成签到,获得积分10
13秒前
tywwxy发布了新的文献求助10
14秒前
陆梦鱼完成签到,获得积分10
14秒前
希望天下0贩的0应助嫩嫩采纳,获得10
14秒前
CodeCraft应助wszldmn采纳,获得10
14秒前
15秒前
德伯88完成签到,获得积分10
15秒前
Koala完成签到 ,获得积分20
15秒前
May完成签到,获得积分10
16秒前
科研通AI6.3应助nkpdsy采纳,获得10
16秒前
Uncanny给Uncanny的求助进行了留言
16秒前
16秒前
16秒前
17秒前
学术文献互助应助17采纳,获得200
17秒前
高分求助中
Clinical Epidemiology: The Essentials, 6e 10000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Graphene Handbook (2019 Edition) 800
Adhesion Science: Principles & Practice 800
Signals, Systems, and Signal Processing 610
IEST-RP-CC018: Cleanroom Cleaning and Sanitization: Operating and Monitoring Procedures 600
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6540895
求助须知:如何正确求助?哪些是违规求助? 8331863
关于积分的说明 17854851
捐赠科研通 5646769
什么是DOI,文献DOI怎么找? 2936426
邀请新用户注册赠送积分活动 1912511
关于科研通互助平台的介绍 1773529