计算机科学
残余物
强化学习
修剪
动作(物理)
抽象
多样性(政治)
人工智能
机器学习
空格(标点符号)
熵(时间箭头)
算法
认识论
社会学
哲学
物理
量子力学
人类学
农学
生物
操作系统
作者
Anjie Zhu,Feiyu Chen,Hui Xu,Deqiang Ouyang,Jie Shao
标识
DOI:10.1109/tnnls.2021.3128666
摘要
Extracting temporal abstraction (option), which empowers the action space, is a crucial challenge in hierarchical reinforcement learning. Under a well-structured action space, decision-making agents can probe more deeply in the searching or plan efficiently through pruning irrelevant action candidates. However, automatically capturing a well-performed temporal abstraction is a nontrivial challenge due to its insufficient exploration and inadequate functionality. We consider alleviating this challenge from two perspectives, i.e., diversity and individuality. For the aspect of diversity, we propose a maximum entropy model based on ensembled options to encourage exploration. For the aspect of individuality, we propose to distinguish each option accurately, utilizing mutual formation minimization, so that each option can better express and function. We name our framework as an ensemble with soft option (ESO) critics. Furthermore, the residual algorithm (RA) with a bidirectional target network is introduced to stabilize bootstrapping, yielding a residual version of ESO. We provide detailed analysis for extensive experiments, which shows that our method boosts performance in commonly used continuous control tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI