强化学习
计算机科学
马尔可夫决策过程
异步通信
调度(生产过程)
增强学习
最大化
利润最大化
卫星
人工神经网络
实时计算
马尔可夫链
数学优化
运筹学
马尔可夫过程
利润(经济学)
人工智能
机器学习
计算机网络
工程类
统计
数学
微观经济学
经济
航空航天工程
作者
Xiaoli Bao,Shumei Zhang,Xiuyun Zhang
标识
DOI:10.1109/cac51589.2020.9327581
摘要
With the increasing number of satellites in orbit and the growing observation missions, how to make an allocation scheme with the maximization of total profit effectively has become increasingly important. In this paper, an effective method based on Reinforcement Learning is proposed to solve satellite mission scheduling problem, in which the arrival missions are arranged immediately without waiting all missions collected. Firstly, a mathematical model based on Markov Decision Process is established, whose goal is to find an optimal policy to maximize the accumulated reward. Then, Asynchronous Advantage Actor-Critic algorithm with neural network is used to assign missions to different satellites. The simulation experiments with comparison to first come first service algorithm and genetic algorithm are conducted, which demonstrates that the proposed method performs well with respect to real-time speed and solution quality.
科研通智能强力驱动
Strongly Powered by AbleSci AI