强化学习
稳健性(进化)
计算机科学
监督人
一套
分类
人工智能
功能(生物学)
钢筋
机器学习
人机交互
心理学
社会心理学
进化生物学
生物化学
生物
历史
政治学
基因
考古
化学
法学
作者
Jan Leike,Miljan Martic,Victoria Krakovna,Pedro A. Ortega,Tom Everitt,Andrew Lefrancq,Laurent Orseau,Shane Legg
出处
期刊:Cornell University - arXiv
日期:2017-11-27
被引量:117
标识
DOI:10.48550/arxiv.1711.09883
摘要
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
科研通智能强力驱动
Strongly Powered by AbleSci AI