功能(生物学)
动态规划
趋同(经济学)
参数化复杂度
符号
迭代法
数学
收敛速度
数学优化
非线性系统
计算机科学
算法
算术
物理
生物
频道(广播)
进化生物学
量子力学
经济
经济增长
计算机网络
标识
DOI:10.1109/tsmc.2023.3247466
摘要
In this article, a policy optimization adaptive dynamic programming (POADP) method is developed for optimal control of discrete-time unknown nonlinear systems, where the iterative control policy is parameterized to optimize the iterative $Q$ -function directly. The relaxed condition for the learning rate is given to guarantee the convergence of the present algorithm. Furthermore, the Polyak– ojasiewicz inequality is introduced to analyze the optimality, i.e., the iterative $Q$ -function converges to the optimum within a given computational threshold under a finite iteration, and the rate of convergence (i.e., the required minimum number of iterations) for the developed POADP method is also illustrated. To ease real implementations, the iterative $Q$ -function and the iterative control policy are approximated by employing an actor–critic structure. Then, an experiment-based method is developed to obtain the initial weights of actor–critic structure. Finally, numerical simulation results of two examples are provided to validate the effectiveness of the POADP algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI