计算机科学
强化学习
稳健性(进化)
任务(项目管理)
样品(材料)
人工智能
过程(计算)
机器学习
样本复杂性
人机交互
工程类
操作系统
系统工程
化学
基因
生物化学
色谱法
作者
Thomas Kleine Buening,Christos Dimitrakakis
出处
期刊:Cornell University - arXiv
日期:2022-10-26
标识
DOI:10.48550/arxiv.2210.14972
摘要
Learning a reward function from demonstrations suffers from low sample-efficiency. Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. We tackle these challenges through adaptive environment design. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function as quickly as possible from the expert's demonstrations in said environments. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
科研通智能强力驱动
Strongly Powered by AbleSci AI