计算机科学                        
                
                                
                        
                            强化学习                        
                
                                
                        
                            反向                        
                
                                
                        
                            对抗制                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            过程(计算)                        
                
                                
                        
                            弹道                        
                
                                
                        
                            功能(生物学)                        
                
                                
                        
                            数学优化                        
                
                                
                        
                            学徒制                        
                
                                
                        
                            人工神经网络                        
                
                                
                        
                            机器学习                        
                
                                
                        
                            算法                        
                
                                
                        
                            数学                        
                
                                
                        
                            生物                        
                
                                
                        
                            哲学                        
                
                                
                        
                            物理                        
                
                                
                        
                            操作系统                        
                
                                
                        
                            天文                        
                
                                
                        
                            进化生物学                        
                
                                
                        
                            语言学                        
                
                                
                        
                            几何学                        
                
                        
                    
            作者
            
                Bosen Lian,Wenqian Xue,Frank L. Lewis,Tianyou Chai            
         
                    
        
    
            
            标识
            
                                    DOI:10.1109/tnnls.2021.3114612
                                    
                                
                                 
         
        
                
            摘要
            
            This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI