计算机科学
强化学习
反向
对抗制
人工智能
过程(计算)
弹道
功能(生物学)
数学优化
学徒制
人工神经网络
机器学习
算法
数学
生物
哲学
物理
操作系统
天文
进化生物学
语言学
几何学
作者
Bosen Lian,Wenqian Xue,Frank L. Lewis,Tianyou Chai
标识
DOI:10.1109/tnnls.2021.3114612
摘要
This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI