人工神经网络
李雅普诺夫函数
时差学习
正规化(语言学)
计算机科学
规范(哲学)
数学
样本复杂性
贝尔曼方程
算法
人工智能
数学优化
应用数学
强化学习
非线性系统
物理
量子力学
法学
政治学
作者
Semih Cayci,Siddhartha Satpathi,Niao He,R. Srikant
出处
期刊:IEEE Transactions on Automatic Control
[Institute of Electrical and Electronics Engineers]
日期:2023-05-01
卷期号:68 (5): 2891-2905
标识
DOI:10.1109/tac.2023.3234234
摘要
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD learning. We consider two practically used algorithms, projection-free and max-norm regularized neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms in terms of sample complexity and overparameterization. The results in this work rely on a Lyapunov drift analysis of the network parameters as a stopped and controlled random process.
科研通智能强力驱动
Strongly Powered by AbleSci AI