强化学习
计算机科学
交易策略
算法交易
期货合约
人工智能
高频交易
机器学习
数学优化
计量经济学
数学
经济
财务
金融经济学
作者
Weipeng Zhang,Lu Wang,Liang Xie,Ke Feng,Xiang Liu
标识
DOI:10.1016/j.patcog.2021.108490
摘要
Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problem with a reward function estimated by inverse reinforcement learning. Experimental results on China Commodity Futures Market Data show state-of-the-art performance. To our best knowledge, this is one of the first work deployed in the online trading system via reinforcement learning, in published literature.
科研通智能强力驱动
Strongly Powered by AbleSci AI