Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities

人在回路中 强化学习 职位(财务) 循环(图论) 计算机科学 钢筋 人工智能 业务 心理学 数学 社会心理学 财务 组合数学
作者
Carl Orge Retzlaff,Srijita Das,Christabel Wayllace,Payam Mousavi,Mohammad Afshari,Tianpei Yang,Anna Saranti,Alessa Angerschmid,Matthew E. Taylor,Andreas Holzinger
出处
期刊:Journal of Artificial Intelligence Research [AI Access Foundation]
卷期号:79: 359-415 被引量:49
标识
DOI:10.1613/jair.1.15348
摘要

Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously. In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
科研通AI2S应助科研01采纳,获得10
1秒前
666发布了新的文献求助10
1秒前
1秒前
学勾巴发布了新的文献求助10
2秒前
YukyLu应助祝英台采纳,获得20
4秒前
pain豆先生完成签到 ,获得积分10
4秒前
wkyueeee发布了新的文献求助10
5秒前
6秒前
邱曾烨发布了新的文献求助30
7秒前
ziwei完成签到 ,获得积分10
7秒前
9秒前
10秒前
生而追梦不止完成签到,获得积分10
10秒前
10秒前
10秒前
10秒前
充电宝应助睡不醒采纳,获得10
11秒前
666完成签到,获得积分20
11秒前
11秒前
jingfuhao完成签到,获得积分10
11秒前
何俊功发布了新的文献求助10
12秒前
12秒前
王大锤发布了新的文献求助10
13秒前
CipherSage应助老迟到的信封采纳,获得10
14秒前
浮游应助精明的谷丝采纳,获得10
14秒前
蒋欣完成签到,获得积分10
14秒前
jingfuhao发布了新的文献求助10
14秒前
专注的翠发布了新的文献求助10
14秒前
量子星尘发布了新的文献求助10
16秒前
Miracle发布了新的文献求助10
16秒前
16秒前
JamesPei应助wkyueeee采纳,获得10
17秒前
李健应助小美采纳,获得10
17秒前
冷傲迎梦发布了新的文献求助10
18秒前
18秒前
18秒前
三三完成签到 ,获得积分10
19秒前
19秒前
19秒前
怡然茗茗完成签到 ,获得积分10
20秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Zeolites: From Fundamentals to Emerging Applications 1500
Zur lokalen Geoidbestimmung aus terrestrischen Messungen vertikaler Schweregradienten 1000
Hidden Generalizations Phonological Opacity in Optimality Theory 500
translating meaning 500
Storie e culture della televisione 500
Selected research on camelid physiology and nutrition 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4898662
求助须知:如何正确求助?哪些是违规求助? 4179345
关于积分的说明 12974628
捐赠科研通 3943264
什么是DOI,文献DOI怎么找? 2163262
邀请新用户注册赠送积分活动 1181613
关于科研通互助平台的介绍 1087229