计算机科学
马尔可夫决策过程
对偶(语法数字)
数学优化
调度(生产过程)
运筹学
排队
马尔可夫链
马尔可夫过程
数学
机器学习
计算机网络
统计
文学类
艺术
作者
Yi Chen,Jing Dong,Zhaoran Wang,Chuheng Zhang
出处
期刊:Management Science
[Institute for Operations Research and the Management Sciences]
日期:2025-05-20
被引量:1
标识
DOI:10.1287/mnsc.2022.03736
摘要
In many operations management problems, we need to make decisions sequentially to minimize the cost, satisfying certain constraints. One modeling approach to such problems is the constrained Markov decision process (CMDP). In this work, we develop a data-driven primal-dual algorithm to solve CMDPs. Our approach alternatively applies regularized policy iteration to improve the policy and subgradient ascent to maintain the constraints. Under mild regularity conditions, we show that the algorithm converges at rate [Formula: see text], where T is the number of iterations, for both the discounted and long-run average cost formulations. Our algorithm can be easily combined with advanced deep learning techniques to deal with complex large-scale problems with the additional benefit of straightforward convergence analysis. When the CMDP has a weakly coupled structure, our approach can further reduce the computational complexity through an embedded decomposition. We apply the algorithm to two operations management problems: multiclass queue scheduling and multiproduct inventory management. Numerical experiments demonstrate that our algorithm, when combined with appropriate value function approximations, generates policies that achieve superior performance compared with state-of-the-art heuristics. This paper was accepted by Baris Ata, stochastic models and simulation. Funding: Y. Chen was supported by the Hong Kong Research Grants Council, Early Career Scheme Fund [Grant 26508924], and partially supported by a grant from the National Natural Science Foundation of China [Grant 72495125]. J. Dong was supported by the National Science Foundation [Grant 1944209]. Supplemental Material: The data files are available at https://doi.org/10.1287/mnsc.2022.03736 .
科研通智能强力驱动
Strongly Powered by AbleSci AI