计算机科学
推论
GSM演进的增强数据速率
边缘计算
吞吐量
边缘设备
分拆(数论)
任务(项目管理)
推理机
管道(软件)
分布式计算
并行计算
人工智能
云计算
无线
电信
数学
管理
组合数学
经济
程序设计语言
操作系统
作者
Biao Han,Penglin Dai,Ke Li,Kangli Zhao,Xiaowei Lei
标识
DOI:10.1109/wccct60665.2024.10541524
摘要
The deployment of neural network models to edge devices for task inference is becoming increasingly common. However, due to network bandwidth and computing capability limitations, a single edge device is often insufficient to meet the inference delay requirements of computation-intensive tasks and continuous inference tasks with high load demands. To address this issue, we propose an efficient collaborative distributed inference acceleration framework that offloads tasks to multiple edge devices for parallel execution. The framework takes into account the speed of inference performed by heterogeneous edge devices and combines the principles of pipeline parallelism and the partial dependence characteristics of CNN intra-layer partitions. It also performs intelligent inter-layer and intra-layer partitions to maximize the throughput of continuous task inference and meet the requirements of low single task inference delay. We propose self-adaptive dynamic programming for model partition algorithm (SDPMP) to achieve this. Simulation results indicate that our proposed strategy significantly improves throughput compared to other classical algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI