Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems

计算机科学强化学习灵活性（工程）频道（广播）传输（电信）通信卫星吞吐量数学优化卫星计算机网络无线电信工程类人工智能航空航天工程统计数学

作者

Xin Hu,Yuchen Zhang,Xianglai Liao,Zhijun Liu,Weidong Wang,Fadhel M. Ghannouchi

出处

期刊：IEEE Transactions on Broadcasting [Institute of Electrical and Electronics Engineers]
日期：2020-01-14 卷期号：66 (3): 630-646 被引量：114

标识

DOI：10.1109/tbc.2019.2960940

摘要

When regarding the inherent uncertainty of differentiated services requirements as well as the non-uniform spatial distribution of capacity requests, it is essential to flexibility adjust resources of the satellite to satisfy the different conditions. How to match the system capacity demand with efficient utilization of beam is a brand-new challenge. The convention beam hopping methods ignores the intrinsic correlation between decisions, do not consider the long-term reward, and only achieve the optimal solution at the current time. Therefore, system complexity increases significantly as the increase of the demand for differentiated services or beam number. This paper investigates the optimal policy for beam hopping in DVB-S2X satellite with multiple purposes of assuring the fairness of each beam services, minimizing the delay of real-time services transmission, and maximizing the throughput of non-instant services transmission. Since wireless channel conditions, differentiated services arrival rates have stochastic properties, and the multi-beam satellite environment's dynamics are unknown, the model-free multi-objective deep reinforcement learning approach is used to learn the optimal policy through interactions with the situation. To solve the problem with action dimensional disaster, a novel multi-action selection method based on a Double-Loop Learning (DLL) is proposed. Moreover, the multi-dimensional state is reformulated and obtained by the deep neural network. Under realistic conditions achieving evaluation results demonstrate that the proposed method can pursue multiple objectives simultaneously, and it can also allocate resource intelligently adapting to the user requirements and channel conditions.

求助该文献

最长约 10秒，即可获得该文献文件

Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems

今日热心研友