强化学习
马尔可夫决策过程
盈利能力指数
计算机科学
过程(计算)
部分可观测马尔可夫决策过程
钥匙(锁)
任务(项目管理)
工艺安全
运筹学
风险分析(工程)
人工智能
机器学习
马尔可夫过程
马尔可夫链
计算机安全
工程类
马尔可夫模型
在制品
运营管理
系统工程
经济
操作系统
统计
医学
数学
财务
作者
Ke Jiang,Zhaohui Jiang,Xudong Jiang,Yongfang Xie,Weihua Gui
标识
DOI:10.1109/tnnls.2023.3340741
摘要
Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return.
科研通智能强力驱动
Strongly Powered by AbleSci AI