趋同(经济学)
人工神经网络
功能(生物学)
梯度下降
过程(计算)
适应性学习
计算机科学
简单(哲学)
自适应算法
算法
方案(数学)
函数逼近
数学优化
数学
人工智能
生物
进化生物学
认识论
操作系统
数学分析
哲学
经济增长
经济
作者
Biao Luo,Yin Yang,Derong Liu
标识
DOI:10.1109/tcyb.2018.2821369
摘要
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive -learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the -function, an off-policy adaptive QL algorithm is developed to learn the optimal -function. An adaptive parameter in the policy evaluation is used to achieve tradeoff between the current and future -functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
科研通智能强力驱动
Strongly Powered by AbleSci AI