后悔
非参数统计
计算机科学
提前期
水准点(测量)
上下界
算法
销售损失
数学
数学优化
经济
计量经济学
机器学习
运营管理
数学分析
地理
大地测量学
作者
Huanan Zhang,Xiuli Chao,Cong Shi
出处
期刊:Management Science
[Institute for Operations Research and the Management Sciences]
日期:2019-10-15
卷期号:66 (5): 1962-1980
被引量:90
标识
DOI:10.1287/mnsc.2019.3288
摘要
We consider a periodic-review, single-product inventory system with lost sales and positive lead times under censored demand. In contrast to the classical inventory literature, we assume the firm does not know the demand distribution a priori and makes an adaptive inventory-ordering decision in each period based only on the past sales (censored demand) data. The standard performance measure is regret, which is the cost difference between a learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal base-stock policy, Huh et al. [Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.] developed a nonparametric learning algorithm with a cubic-root convergence rate on regret. An important open question is whether there exists a nonparametric learning algorithm whose regret rate matches the theoretical lower bound of any learning algorithms. In this work, we provide an affirmative answer to this question. More precisely, we propose a new nonparametric algorithm termed the simulated cycle-update policy and establish a square-root convergence rate on regret, which is proven to be the lower bound of any learning algorithm. Our algorithm uses a random cycle-updating rule based on an auxiliary simulated system running in parallel and also involves two new concepts, namely the withheld on-hand inventory and the double-phase cycle gradient estimation. The techniques developed are effective for learning a stochastic system with complex system dynamics and lasting impact of decisions. This paper was accepted by Yinyu Ye, optimization.
科研通智能强力驱动
Strongly Powered by AbleSci AI