发布文献求助

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

强化学习计算机科学马尔可夫决策过程人工智能循环神经网络任务（项目管理）机器学习比例（比率）国家（计算机科学）马尔可夫过程人工神经网络算法数学统计量子力学物理经济管理

作者

Yan Duan,John Schulman,Xi Chen,Peter L. Bartlett,Ilya Sutskever,Pieter Abbeel

出处

期刊：Cornell University - arXiv 日期：2016-01-01 被引量：246

链接

arxiv.org arxiv.org arxiv.org datacite.orgdoi.org

标识

DOI：10.48550/arxiv.1611.02779

摘要

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 论文查重

更新

大幅提高文件上传限制，最高150M (2024-4-1)

更新

新增期刊收藏功能 (2024-03-23)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 汉堡包上传了应助文件

3秒前; 科研通AI2.0上传了应助文件

4秒前; St雪发布了新的文献求助10

7秒前; 传奇3上传了应助文件

9秒前; 罗明明发布了新的文献求助10

9秒前; 我是老大上传了应助文件

13秒前; Air完成签到，获得积分10

13秒前; DQ1175发布了新的文献求助10

14秒前; ANT完成签到，获得积分10

16秒前; 充电宝上传了应助文件

16秒前; 在水一方上传了应助文件

20秒前; AYEFORBIDER发布了新的文献求助20

20秒前; YINZHE发布了新的文献求助10

20秒前; 酷波er的应助被科研通管家采纳，获得10

20秒前; 慕青的应助被科研通管家采纳，获得50

20秒前; 在水一方的应助被科研通管家采纳，获得10

20秒前; wanci的应助被科研通管家采纳，获得10

21秒前; 打打的应助被科研通管家采纳，获得10

21秒前; 思源的应助被科研通管家采纳，获得10

21秒前; 新肺模型完成签到，获得积分10

22秒前; 科研通AI2.0上传了应助文件

24秒前; 小马甲上传了应助文件

24秒前; 栗苒发布了新的文献求助30

24秒前; 潇飞天下发布了新的文献求助10

25秒前; 遇到你真幸运啊完成签到，获得积分10

26秒前; Owen上传了应助文件

28秒前; DQ1175完成签到，获得积分10

28秒前; 不再挨训完成签到，获得积分10

29秒前; jarenthar完成签到，获得积分10

30秒前; 不安饼干发布了新的文献求助10

30秒前; Hello上传了应助文件

31秒前; 科研通AI2.0上传了应助文件

34秒前; Luckyz发布了新的文献求助10

34秒前; 穆弘凯发布了新的文献求助10

35秒前; 11完成签到，获得积分10

38秒前; 不安饼干完成签到，获得积分10

39秒前; 共享精神的应助被科研小白采纳，获得10

39秒前; 在水一方的应助被干净翠桃采纳，获得10

45秒前; 脑洞疼的应助被aowu采纳，获得10

47秒前; Ll发布了新的文献求助10

48秒前

高分求助中: Teaching Social and Emotional Learning in Physical Education 900; Chinese-English Translation Lexicon Version 3.0 500; [Lambert-Eaton syndrome without calcium channel autoantibodies] 440; Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 400; Two-sample Mendelian randomization analysis reveals causal relationships between blood lipids and venous thromboembolism 400; 薩提亞模式團體方案對青年情侶輔導效果之研究 400; 3X3 Basketball: Everything You Need to Know 310

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 2386566; 求助须知：如何正确求助？哪些是违规求助？ 2093010; 关于积分的说明 5266833; 捐赠科研通 1819839; 什么是DOI，文献DOI怎么找？ 907803; 版权声明 559181; 科研通“疑难数据库（出版商）”最低求助积分说明 484911

今日热心研友

个性的紫菜

寻寻觅觅呢

互助遵法尚德

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2024 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：826996720【点击一键加群】如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通