An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning

强化学习 汉密尔顿-雅各比-贝尔曼方程 计算机科学 时差学习 贝尔曼方程 理论(学习稳定性) 功能(生物学) 动态规划 李雅普诺夫函数 过程(计算) 人工智能 倒立摆 非线性系统 数学优化 最优控制 人工神经网络 机器学习 数学 算法 物理 操作系统 生物 进化生物学 量子力学
作者
Shixuan Yao,Xiaochen Liu,Yinghui Zhang,Ze Cui
出处
期刊:Mathematical Biosciences and Engineering [Arizona State University]
卷期号:19 (9): 9258-9290 被引量:4
标识
DOI:10.3934/mbe.2022430
摘要

<abstract> <p>In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.</p> </abstract>
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
谢顺顺发布了新的文献求助50
刚刚
善学以致用应助黄金天下采纳,获得10
刚刚
朱光辉完成签到,获得积分10
1秒前
英俊的铭应助vivi采纳,获得10
1秒前
量子星尘发布了新的文献求助10
2秒前
momo发布了新的文献求助10
2秒前
2秒前
李健的粉丝团团长应助Dann采纳,获得10
3秒前
大模型应助kun采纳,获得10
5秒前
冰魂应助努力独行者采纳,获得30
6秒前
乔心发布了新的文献求助10
6秒前
深情的凝芙完成签到,获得积分20
7秒前
7秒前
甜蜜的盼望完成签到,获得积分10
8秒前
王ml发布了新的文献求助10
9秒前
搜集达人应助黄金天下采纳,获得10
10秒前
Hello应助乔心采纳,获得10
11秒前
Siyu完成签到 ,获得积分10
12秒前
xiao发布了新的文献求助10
16秒前
非而者厚应助zzz采纳,获得10
18秒前
慕青应助Polling采纳,获得30
19秒前
酥瓜完成签到 ,获得积分10
20秒前
Aran_Zhang应助科研通管家采纳,获得10
25秒前
FashionBoy应助科研通管家采纳,获得10
25秒前
CipherSage应助科研通管家采纳,获得10
25秒前
Owen应助科研通管家采纳,获得10
25秒前
应然忆完成签到 ,获得积分10
25秒前
NexusExplorer应助科研通管家采纳,获得10
25秒前
Aran_Zhang应助科研通管家采纳,获得10
26秒前
小宋应助科研通管家采纳,获得10
26秒前
无花果应助科研通管家采纳,获得10
26秒前
Owen应助科研通管家采纳,获得10
26秒前
CipherSage应助科研通管家采纳,获得10
26秒前
Ava应助科研通管家采纳,获得10
26秒前
26秒前
26秒前
桐桐应助科研通管家采纳,获得10
26秒前
NexusExplorer应助科研通管家采纳,获得10
26秒前
27秒前
田様应助xiao采纳,获得10
29秒前
高分求助中
【提示信息,请勿应助】请使用合适的网盘上传文件 10000
The Oxford Encyclopedia of the History of Modern Psychology 1500
Green Star Japan: Esperanto and the International Language Question, 1880–1945 800
Sentimental Republic: Chinese Intellectuals and the Maoist Past 800
The Martian climate revisited: atmosphere and environment of a desert planet 800
Parametric Random Vibration 800
Semiconductor devices : pioneering papers 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3862237
求助须知:如何正确求助?哪些是违规求助? 3404759
关于积分的说明 10641149
捐赠科研通 3127932
什么是DOI,文献DOI怎么找? 1724981
邀请新用户注册赠送积分活动 830762
科研通“疑难数据库(出版商)”最低求助积分说明 779429