计算机科学
推论
管道(软件)
延迟(音频)
人工智能
深度学习
机器学习
并行计算
程序设计语言
电信
作者
Hongjian Shi,Weichu Zheng,Zifei Liu,Ruhui Ma,Haibing Guan
标识
DOI:10.1109/jsac.2023.3280970
摘要
With the rapid development of wireless communication, achieving the neXt generation Ultra-Reliable and Low-Latency Communications (xURLLC) in 6G mobile communication systems has become a critical problem. Among many applications in xURLLC, deep learning model inference requires improvement over its efficiency. Due to the heterogeneous hardware environment in 6G, parallel schedules from distributed machine learning and edge computing has been borrowed to tackle the efficiency problem. However, traditional parallel schedules suffer from high latency, low throughput, and low device utility. In this paper, we propose Automatic Pipeline Parallelism ( AP 2 ), a parallel inference framework for deep learning applications in 6G mobile communication systems, to improve the model inference efficiency while maintaining reliability. AP 2 contains three sub-modules. A task-device affinity predictor predicts a task’s expected execution time on a given device. The parallel inference arrangement optimizer finds the most suitable device for each task. The parallel inference scheduler converts the arrangement to a schedule that can be directly executed in the system. The experimental results show that AP 2 can achieve better latency, throughput, reliability, and device utility than other parallel schedules. Also, the priority of the sub-module designs has been approved through the experiments.
科研通智能强力驱动
Strongly Powered by AbleSci AI