反向
符号
功能(生物学)
人工智能
算法
计算机科学
数学
域代数上的
离散数学
纯数学
算术
几何学
进化生物学
生物
作者
Bosen Lian,Wenqian Xue,Frank L. Lewis,Tianyou Chai
标识
DOI:10.1109/tcyb.2021.3100749
摘要
This article proposes robust inverse Q -learning algorithms for a learner to mimic an expert's states and control inputs in the imitation learning problem. These two agents have different adversarial disturbances. To do the imitation, the learner must reconstruct the unknown expert cost function. The learner only observes the expert's control inputs and uses inverse Q -learning algorithms to reconstruct the unknown expert cost function. The inverse Q -learning algorithms are robust in that they are independent of the system model and allow for the different cost function parameters and disturbances between two agents. We first propose an offline inverse Q -learning algorithm which consists of two iterative learning loops: 1) an inner Q -learning iteration loop and 2) an outer iteration loop based on inverse optimal control. Then, based on this offline algorithm, we further develop an online inverse Q -learning algorithm such that the learner mimics the expert behaviors online with the real-time observation of the expert control inputs. This online computational method has four functional approximators: a critic approximator, two actor approximators, and a state-reward neural network (NN). It simultaneously approximates the parameters of Q -function and the learner state reward online. Convergence and stability proofs are rigorously studied to guarantee the algorithm performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI