浪涌
对偶(语法数字)
业务
基础(拓扑)
运营管理
产业组织
工程类
数学
电气工程
文学类
数学分析
艺术
出处
期刊:Social Science Research Network
[Social Science Electronic Publishing]
日期:2019-01-01
被引量:17
摘要
We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.
科研通智能强力驱动
Strongly Powered by AbleSci AI