已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control

强化学习 后悔 钢筋 计算机科学 控制(管理) 人工智能 机器学习 心理学 社会心理学
作者
Weichao Mao,Kaiqing Zhang,Ruihao Zhu,David Simchi‐Levi,Tamer Başar
出处
期刊:Management Science [Institute for Operations Research and the Management Sciences]
标识
DOI:10.1287/mnsc.2022.02533
摘要

We consider model-free reinforcement learning (RL) in nonstationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for nonstationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of [Formula: see text], where S and A are the numbers of states and actions, respectively, [Formula: see text] is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are nearly optimal by establishing an information-theoretical lower bound of [Formula: see text], the first lower bound in nonstationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multiagent RL and inventory control across related products. This paper was accepted by Omar Besbes, revenue management and market analytics. Funding: The research of D. Simchi-Levi and R. Zhu was supported by the MIT Data Science Laboratory. The research of W. Mao, K. Zhang, and T. Başar was supported in part by the U.S. Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, in part by the Office of Naval Research (ONR) [MURI Grant N00014-16-1-2710], and in part by the Air Force Office of Scientific Research (AFOSR) [Grant FA9550-19-1-0353]. K. Zhang also acknowledges support from U.S. Army Research Laboratory (ARL) [Grant W911NF-24-1-0085]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02533 .
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
传统的幻梦完成签到 ,获得积分10
1秒前
cyj完成签到 ,获得积分10
3秒前
神奇的丫丫完成签到,获得积分20
4秒前
6秒前
平常雨泽完成签到 ,获得积分10
8秒前
8秒前
Miao完成签到,获得积分10
8秒前
9秒前
鱼羊明完成签到 ,获得积分10
10秒前
清脆泥猴桃完成签到,获得积分10
11秒前
受伤沛珊发布了新的文献求助10
12秒前
12秒前
无私思雁完成签到 ,获得积分10
12秒前
量子星尘发布了新的文献求助10
12秒前
li发布了新的文献求助10
13秒前
董文同学完成签到 ,获得积分10
13秒前
bkagyin应助幸福背包采纳,获得10
14秒前
16秒前
SYLH应助稳重元菱采纳,获得10
18秒前
hanshishengye完成签到 ,获得积分10
18秒前
董文同学发布了新的文献求助10
21秒前
21秒前
21秒前
22秒前
扬帆远航完成签到,获得积分10
23秒前
艺术家完成签到 ,获得积分10
24秒前
甜糖弟弟发布了新的文献求助10
26秒前
28秒前
nulinuli完成签到 ,获得积分10
29秒前
李秋静完成签到,获得积分10
29秒前
ikun416完成签到 ,获得积分10
29秒前
Felicity完成签到 ,获得积分10
30秒前
31秒前
隐形惜筠完成签到 ,获得积分10
32秒前
1111发布了新的文献求助10
33秒前
33秒前
YIXIARUI发布了新的文献求助10
33秒前
34秒前
John发布了新的文献求助200
35秒前
高分求助中
【提示信息,请勿应助】请使用合适的网盘上传文件 10000
The Oxford Encyclopedia of the History of Modern Psychology 1500
Green Star Japan: Esperanto and the International Language Question, 1880–1945 800
Sentimental Republic: Chinese Intellectuals and the Maoist Past 800
The Martian climate revisited: atmosphere and environment of a desert planet 800
Parametric Random Vibration 800
Building Quantum Computers 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3863862
求助须知:如何正确求助?哪些是违规求助? 3406115
关于积分的说明 10648454
捐赠科研通 3129969
什么是DOI,文献DOI怎么找? 1726178
邀请新用户注册赠送积分活动 831531
科研通“疑难数据库(出版商)”最低求助积分说明 779889