报童模式
计算机科学
稳健优化
强化学习
分拆
利润(经济学)
数学优化
马尔可夫决策过程
捆绑
斯塔克伯格竞赛
产品(数学)
运筹学
微观经济学
人工智能
供应链
业务
经济
数学
马尔可夫过程
营销
几何学
万维网
统计
材料科学
复合材料
作者
Xiaoli Yan,Frank Chen,Hui Yu,Jiawen Li
标识
DOI:10.1177/10591478251344225
摘要
In fashion, food processing, petrochemical production, and agriculture, products (items) are often bundled in a prefixed assortment, with a given ratio for each product. For example, one case of men’s shoes may contain 24 pairs of different sizes of the same design. Of the 24 pairs, there is one size 7 pair, four sizes 9, and so on. Moreover, those pairs of shoes are packaged independently for retailing. Retailers of such products order them in bundles and then resell them unbundled. In this study, we propose and analyze a newsvendor model in which a retailer decides the order quantity of the whole bundle before the uncertain demand for each product/item is realized. We call it a product unbundling newsvendor problem (PUNP): How should the retailer decide the ordering quantity of a product bundle to meet the unknown demands of individual items to maximize its expected profit? We approach this problem with a robust optimization approach that assumes the means and covariance matrix of stochastic demands but not the demand distributions. However, the robust approach that considers the worst-case demand scenario is perceived to be conservative. In this study, we incorporate the distributionally robust optimization with deep reinforcement learning (DRL) and propose a new paradigm of robust learning to improve the robust decision quality. We take this robust solution, that is, the order quantity and profit, as human domain knowledge and implement it into the decision-making process of DRL by designing a policy transfer mechanism. Unsurprisingly, the exact robust solution is computationally intractable; thus, we provide an approximate solution. Simulations were conducted based on limited data sizes, confirming that our approach effectively improves robust performance. Moreover, the hybrid approach significantly outperforms the DRL approach. In the meantime, reduced computing costs and increased interpretability of decision recommendations may facilitate the deployment of DRL algorithms in operational practice. Furthermore, the successful application of the hybrid approach in addressing several variants of the PUNP indicates that the proposed mechanism may provide a pathway for solving complex operational problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI