Training effective deep reinforcement learning agents for real-time life-cycle production optimization

强化学习马尔可夫决策过程数学优化计算机科学增强学习时间范围最优控制生产（经济）贝尔曼方程动态规划任务（项目管理）人工智能马尔可夫过程工程类数学统计宏观经济学经济系统工程

作者

Kai Zhang,Zhongzheng Wang,Guodong Chen,Liming Zhang,Yongfei Yang,Chuanjin Yao,Jian Wang,Jun Yao

出处

期刊：Journal of Petroleum Science and Engineering [Elsevier BV]
日期：2021-11-05 卷期号：208: 109766-109766 被引量：143

标识

DOI：10.1016/j.petrol.2021.109766

摘要

Life-cycle production optimization aims to obtain the optimal well control scheme at each time control step to maximize financial profit and hydrocarbon production. However, searching for the optimal policy under the limited number of simulation evaluations is a challenging task. In this paper, a novel production optimization method is presented, which maximizes the net present value (NPV) over the entire life-cycle and achieves real-time well control scheme adjustment. The proposed method models the life-cycle production optimization problem as a finite-horizon Markov decision process (MDP), where the well control scheme can be viewed as sequence decisions. Soft actor-critic, known as the state-of-the-art model-free deep reinforcement learning (DRL) algorithm, is subsequently utilized to train DRL agents that can solve the above MDP. The DRL agent strives to maximize long-term NPV rewards as well as the control scheme randomness by training a stochastic policy that maps reservoir states to well control variables and an action-value function that estimates the objective value of the current policy. Since the trained policy is an explicit function structure, the DRL agent can adjust the well control scheme in real-time under different reservoir states. Different from most existing methods that introduce task-specific sensitive parameters or construct complex supplementary structures, the DRL agent learns adaptively by executing goal-directed interactions with an uncertain reservoir environment and making use of accumulated well control experience, which is similar to the actual field well control mode. The key insight here is that the DRL method's ability to utilize gradients information (well-control experience) for higher sample efficiency. The simulation results based on two reservoir models indicate that compared to other optimization methods, the proposed method can attain higher NPV and access excellent performance in terms of oil displacement. • A novel production optimization framework that incorporating advanced deep reinforcement leaning technologies is presented. • The proposed method models the life-cycle production optimization problem as a finite-horizon Markov decision process. • The trained policy is an explicit function structure that utilizing powerful gradient information for higher sample efficiency. • The proposed method achieves excellent performance on one classic control task and two reservoir models.

求助该文献

最长约 10秒，即可获得该文献文件

Training effective deep reinforcement learning agents for real-time life-cycle production optimization

今日热心研友