Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

避障强化学习控制理论（社会学）计算机科学控制器（灌溉）梯度下降自适应控制李雅普诺夫函数运动学人工神经网络趋同（经济学）最优控制障碍物人工智能移动机器人数学优化非线性系统数学机器人控制（管理）物理经典力学量子力学法学农学经济政治学生物经济增长

作者

Ke Wang,Chaoxu Mu,Zhen Ni,Derong Liu

出处

期刊：IEEE Transactions on Automation Science and Engineering [Institute of Electrical and Electronics Engineers]
日期：2023-09-04 卷期号：21 (3): 4599-4612 被引量：13

标识

DOI：10.1109/tase.2023.3299275

摘要

This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle; at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method. Note to Practitioners —This paper is motivated by the obstacle avoidance problem of real-time navigation of an agent to the target point, which applies to practical autonomous systems such as vehicles and robots. Pre-generative methods and reactive methods have been widely employed to generate safe motion trajectories in the obstacle environment. However, these methods cannot strike a good balance between safety and optimality. In this paper, the obstacle avoidance problem is formulated in the sense of optimal control, and a safe reinforcement learning method is designed to generate safe motion trajectories. This method combines the advantages of model-based policy iteration and state-following-based approximation, in which the former ensures regional optimality while the latter ensures local safety. Based on the proposed adaptive tuning laws, engineers are able to design learning-based avoidance controllers in the environment with static obstacles. In future research, we will address the dynamic avoidance problem against moving obstacles.

求助该文献

最长约 10秒，即可获得该文献文件

Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

今日热心研友