强化学习
计算机科学
事后诸葛亮
人工智能
任务(项目管理)
维数之咒
机器人学
集合(抽象数据类型)
加速
机器人
功能(生物学)
机器学习
简单(哲学)
工程类
操作系统
哲学
认识论
生物
认知心理学
程序设计语言
系统工程
进化生物学
心理学
作者
Ashvin Nair,Bob McGrew,Marcin Andrychowicz,Wojciech Zaremba,Pieter Abbeel
标识
DOI:10.1109/icra.2018.8463162
摘要
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.
科研通智能强力驱动
Strongly Powered by AbleSci AI