贝尔曼方程
强化学习
最优控制
启发式
趋同(经济学)
动态规划
非线性系统
等价(形式语言)
计算机科学
数学优化
人工神经网络
班级(哲学)
功能(生物学)
数学
人工智能
离散数学
进化生物学
生物
量子力学
物理
经济
经济增长
作者
Luyang Yu,Weibo Liu,Yurong Liu,Fawaz E. Alsaadi
摘要
Abstract This article investigates the optimal control problem via reinforcement learning for a class of nonlinear discrete‐time systems. The nonlinear system under consideration is assumed to be partially unknown. A new learning‐based algorithm, T ‐step heuristic dynamic programming with eligibility traces ( T ‐sHDP( )), is proposed to tackle the optimal control problem for such partially unknown system. First, the concerned optimal control problem is turned into its equivalence problem, that is, solving a Bellman equation. Then, the T ‐sHDP( ) is utilized to get an approximate solution of Bellman equation, and a rigorous convergence analysis is also conducted as well. Instead of the commonly used single step update approach, the T ‐sHDP( ) stores finite step past returns by introducing a parameter, and then utilizes these knowledge to update the value function (VF) of multiple moments synchronously, so as to achieve higher convergence speed. For implementation of T ‐sHDP( ), a neural network‐based actor‐critic architecture is applied to approximate VF and optimal control scheme. Finally, the feasibility of the algorithm is demonstrated by two illustrative simulation examples.
科研通智能强力驱动
Strongly Powered by AbleSci AI