可观测性
部分可观测马尔可夫决策过程
计算机科学
调度(生产过程)
弹道
马尔可夫过程
马尔可夫决策过程
马尔可夫链
接头(建筑物)
马尔可夫模型
数学优化
实时计算
机器学习
数学
建筑工程
统计
物理
天文
应用数学
工程类
作者
Danhao Deng,Chaowei Wang,Weidong Wang
标识
DOI:10.1109/lcomm.2022.3167110
摘要
The unmanned aerial vehicle (UAV)-aided vehicular communication can be greatly facilitated by joint optimization of UAV trajectory design and vehicle assignment. While most existing works are based on the full observation of system state, we consider the partial observability with predicting the vehicles' trajectories. The vehicle trajectory prediction and joint optimization problem are modeled as a Partially-Observable Markov Decision Process (POMDP). To deal with the non-Markovian of the POMDP, we construct a new deep recurrent Q-network (DRQN) framework based on deep Q-network (DQN) algorithm and Long Short Term Memory (LSTM) layer. Simulation results demonstrate that the proposed DRQN-based scheme is fast convergent and outperforms the baseline schemes in terms of the sum spectral efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI