强化学习
贝尔曼方程
不可见的
文件夹
数学优化
随机控制
最优控制
数学
计算机科学
经济
计量经济学
人工智能
财务
标识
DOI:10.1016/j.jedc.2023.104787
摘要
We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic partial differential equations (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.
科研通智能强力驱动
Strongly Powered by AbleSci AI