计算机科学
强化学习
概率逻辑
交易策略
动态规划
图形
金融市场
夏普比率
算法交易
状态空间
Dijkstra算法
人工智能
最短路径问题
财务
算法
理论计算机科学
统计
经济
数学
文件夹
作者
Zhongjie Duan,Cen Chen,Dawei Cheng,Yuqi Liang,Weining Qian
标识
DOI:10.1145/3511808.3557412
摘要
Algorithmic trading is a crucial yet challenging task in the financial domain, where trading decisions are made sequentially from milliseconds to days based on the historical price movements and trading frequency. To model such a sequential decision making process in the dynamic financial markets, Deep Reinforcement Learning (DRL) based methods have been applied and demonstrated their success in finding trading strategies that achieve profitable returns. However, the financial markets are complex imperfect information games with high-level of noise and uncertainties which usually make the exploration policy of DRL less effective. In this paper, we propose an end-to-end DRL method that explores solutions on the whole graph via a probabilistic dynamic programming algorithm. Specifically, we separate the state into environment state and position state, and model the position state transition as a directed acyclic graph. To obtain reliable gradients for model training, we adopt a probabilistic dynamic programming algorithm to explore solutions over the whole graph instead of sampling a path. By avoiding the sampling procedure, we propose an efficient training algorithm and overcome the efficiency problem in most existing DRL methods. Furthermore, our method is compatible with most recurrent neural network architecture, which makes our method easy to implement and very effective in practice. Extensive experiments have been conducted on two real-world stock datasets. Experimental results demonstrate that our method can generate stable trading strategies for both high-frequency and low-frequency trading, significantly outperforming the baseline DRL methods on annualized return and Sharpe ratio.
科研通智能强力驱动
Strongly Powered by AbleSci AI