强化学习
计算机科学
可扩展性
尺寸
数学优化
人工智能
运筹学
数学
化学
数据库
有机化学
作者
Lotte van Hezewijk,Nico Dellaert,Willem van Jaarsveld
标识
DOI:10.1016/j.ijpe.2025.109601
摘要
Capacitated lot sizing problems in situations with stationary and non-stationary demand (SCLSP) are very common in practice. Solving problems with a large number of items using Deep Reinforcement Learning (DRL) is challenging due to the large action space. This paper proposes a new Markov Decision Process (MDP) formulation to solve this problem, by decomposing the production quantity decisions in a period into sub-decisions, which reduces the action space dramatically. We demonstrate that applying Deep Controlled Learning (DCL) yields policies that outperform the benchmark heuristic as well as a prior DRL implementation. By using the decomposed MDP formulation and DCL method outlined in this paper, we can solve larger problems compared to the previous DRL implementation. Moreover, we adopt a non-stationary demand model for training the policy, which enables us to readily apply the trained policy in dynamic environments when demand changes.
科研通智能强力驱动
Strongly Powered by AbleSci AI