亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Human-level control through deep reinforcement learning

强化学习 人工智能 计算机科学 多样性(控制论) 控制(管理) 感知 人机交互 深度学习 机器学习 生物 神经科学
作者
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis
出处
期刊:Nature [Springer Nature]
卷期号:518 (7540): 529-533 被引量:20397
标识
DOI:10.1038/nature14236
摘要

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
star应助呼延曼青采纳,获得10
刚刚
13秒前
smellycat完成签到,获得积分10
14秒前
16秒前
小谷完成签到,获得积分10
16秒前
17秒前
smellycat发布了新的文献求助10
18秒前
小谷发布了新的文献求助10
20秒前
充电宝应助小谷采纳,获得10
27秒前
自由的果汁完成签到,获得积分10
38秒前
呼延曼青完成签到,获得积分10
38秒前
猪猪hero应助陈秋采纳,获得10
42秒前
55秒前
1分钟前
1分钟前
猪猪hero应助陈秋采纳,获得30
1分钟前
真实的莫言完成签到,获得积分10
1分钟前
1分钟前
陈炜smile完成签到,获得积分10
1分钟前
苍穹鹰发布了新的文献求助10
1分钟前
1分钟前
苯酚完成签到 ,获得积分10
1分钟前
光芒万丈发布了新的文献求助10
1分钟前
beplayer1完成签到,获得积分10
2分钟前
可爱迪应助光芒万丈采纳,获得10
2分钟前
2分钟前
光芒万丈完成签到,获得积分20
2分钟前
竹筏过海应助啦啦啦采纳,获得30
2分钟前
思有完成签到 ,获得积分10
2分钟前
所所应助陈秋采纳,获得20
2分钟前
2分钟前
2分钟前
TANGTANG发布了新的文献求助10
2分钟前
SOLOMON举报Fan Windy Hu求助涉嫌违规
3分钟前
万能图书馆应助风行采纳,获得10
3分钟前
3分钟前
SOLOMON应助科研通管家采纳,获得10
3分钟前
奋斗寒天发布了新的文献求助10
3分钟前
SOLOMON应助陈秋采纳,获得30
3分钟前
4分钟前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Sport in der Antike 800
De arte gymnastica. The art of gymnastics 600
Berns Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500
Stephen R. Mackinnon - Chen Hansheng: China’s Last Romantic Revolutionary (2023) 500
Sport in der Antike Hardcover – March 1, 2015 500
Psychological Warfare Operations at Lower Echelons in the Eighth Army, July 1952 – July 1953 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2424493
求助须知:如何正确求助?哪些是违规求助? 2112342
关于积分的说明 5350332
捐赠科研通 1839945
什么是DOI,文献DOI怎么找? 915878
版权声明 561327
科研通“疑难数据库(出版商)”最低求助积分说明 489892