计算机科学
强化学习
人工智能
Boosting(机器学习)
阿达布思
熵(时间箭头)
分类器(UML)
机器学习
算法
量子力学
物理
作者
Tao Zhang,Ying Liu,Maxwell Hwang,Kao-Shing Hwang,C. Ma,Jing Cheng
标识
DOI:10.1016/j.ins.2020.01.023
摘要
Inverse reinforcement learning (IRL) involves imitating expert behaviors by recovering reward functions from demonstrations. This study proposes a model-free IRL algorithm to solve the dilemma of predicting the unknown reward function. The proposed end-to-end model comprises a dual structure of autoencoders in parallel. The model uses a state encoding method to reduce the computational complexity for high-dimensional environments and utilizes an Adaboost classifier to determine the difference between the predicted and demonstrated reward functions. Relative entropy is used as a metric to measure the difference between the demonstrated and the imitated behavior. The simulation experiments demonstrate the effectiveness of the proposed method in terms of the number of iterations that are required for the estimation.
科研通智能强力驱动
Strongly Powered by AbleSci AI