避障
强化学习
控制理论(社会学)
计算机科学
控制器(灌溉)
梯度下降
自适应控制
李雅普诺夫函数
运动学
人工神经网络
趋同(经济学)
最优控制
障碍物
人工智能
移动机器人
数学优化
非线性系统
数学
机器人
控制(管理)
物理
经典力学
量子力学
法学
农学
经济
政治学
生物
经济增长
作者
Ke Wang,Chaoxu Mu,Zhen Ni,Derong Liu
标识
DOI:10.1109/tase.2023.3299275
摘要
This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle; at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method. Note to Practitioners —This paper is motivated by the obstacle avoidance problem of real-time navigation of an agent to the target point, which applies to practical autonomous systems such as vehicles and robots. Pre-generative methods and reactive methods have been widely employed to generate safe motion trajectories in the obstacle environment. However, these methods cannot strike a good balance between safety and optimality. In this paper, the obstacle avoidance problem is formulated in the sense of optimal control, and a safe reinforcement learning method is designed to generate safe motion trajectories. This method combines the advantages of model-based policy iteration and state-following-based approximation, in which the former ensures regional optimality while the latter ensures local safety. Based on the proposed adaptive tuning laws, engineers are able to design learning-based avoidance controllers in the environment with static obstacles. In future research, we will address the dynamic avoidance problem against moving obstacles.
科研通智能强力驱动
Strongly Powered by AbleSci AI