参数化复杂度
数学
趋同(经济学)
马尔可夫决策过程
子空间拓扑
班级(哲学)
算法
国家(计算机科学)
数学优化
计算机科学
人工智能
马尔可夫过程
经济增长
统计
数学分析
经济
作者
Vijay R. Konda,John N. Tsitsiklis
标识
DOI:10.1137/s0363012901385691
摘要
In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.
科研通智能强力驱动
Strongly Powered by AbleSci AI