Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

避障 强化学习 控制理论(社会学) 计算机科学 控制器(灌溉) 梯度下降 自适应控制 李雅普诺夫函数 运动学 人工神经网络 趋同(经济学) 最优控制 障碍物 人工智能 移动机器人 数学优化 非线性系统 数学 机器人 控制(管理) 物理 经典力学 量子力学 法学 农学 经济 政治学 生物 经济增长
作者
Ke Wang,Chaoxu Mu,Zhen Ni,Derong Liu
出处
期刊:IEEE Transactions on Automation Science and Engineering [Institute of Electrical and Electronics Engineers]
卷期号:21 (3): 4599-4612 被引量:13
标识
DOI:10.1109/tase.2023.3299275
摘要

This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle; at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method. Note to Practitioners —This paper is motivated by the obstacle avoidance problem of real-time navigation of an agent to the target point, which applies to practical autonomous systems such as vehicles and robots. Pre-generative methods and reactive methods have been widely employed to generate safe motion trajectories in the obstacle environment. However, these methods cannot strike a good balance between safety and optimality. In this paper, the obstacle avoidance problem is formulated in the sense of optimal control, and a safe reinforcement learning method is designed to generate safe motion trajectories. This method combines the advantages of model-based policy iteration and state-following-based approximation, in which the former ensures regional optimality while the latter ensures local safety. Based on the proposed adaptive tuning laws, engineers are able to design learning-based avoidance controllers in the environment with static obstacles. In future research, we will address the dynamic avoidance problem against moving obstacles.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
整齐棉花糖完成签到,获得积分10
刚刚
1秒前
123完成签到 ,获得积分10
2秒前
王二饼发布了新的文献求助10
4秒前
笑点低靖雁完成签到,获得积分10
4秒前
。。。发布了新的文献求助20
7秒前
情怀应助小伙不错采纳,获得20
8秒前
。。。完成签到 ,获得积分10
12秒前
自然的小熊猫完成签到 ,获得积分10
13秒前
13秒前
Spteer完成签到,获得积分10
15秒前
上官若男应助nemo采纳,获得10
15秒前
Lee完成签到,获得积分10
16秒前
16秒前
18秒前
田様应助张某某采纳,获得10
19秒前
怕黑的擎发布了新的文献求助10
21秒前
21秒前
22秒前
李倇仪完成签到,获得积分10
23秒前
小伙不错发布了新的文献求助20
23秒前
谨慎飞丹完成签到 ,获得积分0
25秒前
nemo发布了新的文献求助10
27秒前
儒雅凡桃发布了新的文献求助10
27秒前
28秒前
可靠的冰烟完成签到,获得积分10
32秒前
大白完成签到,获得积分10
32秒前
32秒前
YoungLee完成签到,获得积分10
35秒前
HEIKU应助阿宝采纳,获得10
36秒前
41秒前
42秒前
王二饼完成签到,获得积分20
44秒前
betyby完成签到 ,获得积分10
45秒前
YOLO完成签到 ,获得积分10
45秒前
汉堡包应助一区top采纳,获得10
46秒前
上上签发布了新的文献求助10
47秒前
嘟嘟完成签到,获得积分10
48秒前
忧郁难胜完成签到,获得积分10
49秒前
glay完成签到 ,获得积分10
49秒前
高分求助中
Encyclopedia of Mathematical Physics 2nd edition 888
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
共融服務學習指南 300
Essentials of Pharmacoeconomics: Health Economics and Outcomes Research 3rd Edition. by Karen Rascati 300
Peking Blues // Liao San 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3801430
求助须知:如何正确求助?哪些是违规求助? 3347140
关于积分的说明 10332038
捐赠科研通 3063426
什么是DOI,文献DOI怎么找? 1681673
邀请新用户注册赠送积分活动 807650
科研通“疑难数据库(出版商)”最低求助积分说明 763843