强化学习
可控性
数学优化
惩罚法
计算机科学
领域(数学)
最优化问题
数学
人工智能
应用数学
纯数学
作者
Fengrun Tang,Zhenxiang Feng,Yonggang Li,Chunhua Yang,Bei Sun
标识
DOI:10.1016/j.aei.2023.102197
摘要
In the zinc oxide rotary volatile kiln (ZORVK), an optimal temperature field is essential to balance the strong conflict between zinc recovery rate and carbon emissions. However, the complex and diverse temperature distribution modes make it challenging to quickly obtain optimization results under intricate controllability constraints and multi-conflict production objectives. In this study, a novel constrained multi-objective deep reinforcement learning (CMODRL) approach for temperature field optimization of the ZORVK is proposed. First, an evaluation metric called the uncontrollable factor is designed to quantify the controllability of the temperature field. Then, a dynamic penalty method in deep reinforcement learning (DRL) is proposed to handle the controllability constraint, in which the penalty coefficient is dynamically adjusted according to the training loss. After that, the Chebyshev scalarization function is introduced as an action selection mechanism in DRL. Finally, the CMODRL is developed by integrating the dynamic penalty and Chebyshev scalarization function into the multi-objective deep reinforcement learning (MODRL) framework. As a result, for any given preference between the two production objectives, the proposed method can rapidly get the Pareto-optimal solution fulfilling the constraint. Moreover, the optimization efficiency of the MODRL-based algorithm is forty times higher than that of the multi-objective genetic algorithm, which serves better for practical optimization problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI