Newsvendor Problems With Product Unbundling: An Approach Combining Robust Optimization With Deep Reinforcement Learning

报童模式计算机科学稳健优化强化学习分拆利润（经济学）数学优化马尔可夫决策过程捆绑斯塔克伯格竞赛产品（数学）运筹学微观经济学人工智能供应链业务经济数学马尔可夫过程营销统计万维网复合材料材料科学几何学

作者

Xiaoli Yan,Frank Chen,Hui Yu,Jiawen Li

出处

期刊：Production and Operations Management [Wiley]
日期：2025-06-20 卷期号：34 (11): 3629-3646 被引量：1

标识

摘要

In fashion, food processing, petrochemical production, and agriculture, products (items) are often bundled in a prefixed assortment, with a given ratio for each product. For example, one case of men’s shoes may contain 24 pairs of different sizes of the same design. Of the 24 pairs, there is one size 7 pair, four sizes 9, and so on. Moreover, those pairs of shoes are packaged independently for retailing. Retailers of such products order them in bundles and then resell them unbundled. In this study, we propose and analyze a newsvendor model in which a retailer decides the order quantity of the whole bundle before the uncertain demand for each product/item is realized. We call it a product unbundling newsvendor problem (PUNP): How should the retailer decide the ordering quantity of a product bundle to meet the unknown demands of individual items to maximize its expected profit? We approach this problem with a robust optimization approach that assumes the means and covariance matrix of stochastic demands but not the demand distributions. However, the robust approach that considers the worst-case demand scenario is perceived to be conservative. In this study, we incorporate the distributionally robust optimization with deep reinforcement learning (DRL) and propose a new paradigm of robust learning to improve the robust decision quality. We take this robust solution, that is, the order quantity and profit, as human domain knowledge and implement it into the decision-making process of DRL by designing a policy transfer mechanism. Unsurprisingly, the exact robust solution is computationally intractable; thus, we provide an approximate solution. Simulations were conducted based on limited data sizes, confirming that our approach effectively improves robust performance. Moreover, the hybrid approach significantly outperforms the DRL approach. In the meantime, reduced computing costs and increased interpretability of decision recommendations may facilitate the deployment of DRL algorithms in operational practice. Furthermore, the successful application of the hybrid approach in addressing several variants of the PUNP indicates that the proposed mechanism may provide a pathway for solving complex operational problems.

求助该文献

最长约 10秒，即可获得该文献文件

Newsvendor Problems With Product Unbundling: An Approach Combining Robust Optimization With Deep Reinforcement Learning

今日热心研友