An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning

强化学习汉密尔顿-雅各比-贝尔曼方程计算机科学时差学习贝尔曼方程理论（学习稳定性）功能（生物学）动态规划李雅普诺夫函数过程（计算）人工智能倒立摆非线性系统数学优化最优控制人工神经网络机器学习数学算法物理操作系统生物进化生物学量子力学

作者

Shixuan Yao,Xiaochen Liu,Yinghui Zhang,Ze Cui

出处

期刊：Mathematical Biosciences and Engineering [Arizona State University]
日期：2022-01-01 卷期号：19 (9): 9258-9290 被引量：4

链接

doi.org nih.govdoi.org

标识

DOI：10.3934/mbe.2022430

摘要

<abstract> <p>In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.</p> </abstract>

求助该文献

最长约 10秒，即可获得该文献文件

An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning

今日热心研友