参数化复杂度
静止点
弹道
指数函数
数学优化
集合(抽象数据类型)
数学
梯度法
平均成本
序列(生物学)
点(几何)
计算机科学
应用数学
算法
数学分析
程序设计语言
物理
经济
新古典经济学
天文
生物
遗传学
几何学
作者
Mehrdad Moharrami,Yashaswini Murthy,Arghyadip Roy,R. Srikant
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:2
标识
DOI:10.48550/arxiv.2202.04157
摘要
We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average-cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.
科研通智能强力驱动
Strongly Powered by AbleSci AI