牛鞭效应
强化学习
钢筋
计算机科学
博弈论
人工智能
经济
运筹学
微观经济学
工程类
业务
心理学
营销
供应链
供应链管理
社会心理学
作者
Maxim Rozhkov,Nataliya Alyamovskaya,Г В Заходякин
标识
DOI:10.1080/00207543.2025.2479831
摘要
This article investigates the application of reinforcement learning (RL) methods to optimise a four-echelon linear supply chain model with stochastic demand. The proposed supply chain configuration is largely based on the production-distribution supply chain of the MIT Supply Chain Beer Game. We show that RL can significantly improve ordering efficiency and overall supply chain performance. The model environment is adapted for the OpenAI 'gymnasium' interface with the usage of reward shaping (reward engineering) in the model training process. The algorithm employs two reward function components: costs and order variance metric. We evaluate the effectiveness of RL against Order-Up-To inventory management policies for several supply chain configurations and assess the impact on the overall supply chain stability. An algorithm based on a recurrent proximal policy optimisation (RPPO) is effective for the beer game setup and outperforms Order-Up-To approaches. This RL algorithm generates different ordering patterns and tends to narrow the action space for the agent and thus, to mitigate the bullwhip effect in a more effective way. Our findings suggest that an improvement in the reduction of the bullwhip effect impact is present even if only one agent in the supply chain uses the algorithm as an ordering policy.
科研通智能强力驱动
Strongly Powered by AbleSci AI