计算机科学
推论
延迟(音频)
人工智能
卷积神经网络
计算机视觉
帧(网络)
棱锥(几何)
帧速率
目标检测
低延迟(资本市场)
高分辨率
对象(语法)
GSM演进的增强数据速率
边缘设备
模式识别(心理学)
云计算
电信
计算机网络
物理
遥感
光学
地质学
操作系统
作者
Xiaohang Shi,Sheng Zhang,Jie Wu,Ning Chen,Ke Cheng,Yu Liang,Sanglu Lu
标识
DOI:10.1109/tmc.2023.3343448
摘要
Deep convolutional neural network (NN)-based object detectors are not appropriate for straightforward inference on high-resolution videos at edge devices, as maintaining high accuracy often brings about prohibitively long latency. Although existing solutions have attempted to reduce on-device inference latency by selecting a cheaper configuration (e.g., choosing a more lightweight NN or scaling a frame to a smaller size before inference) or eliminating a background containing no object, they often ignore various high-resolution features and fail to optimize for those videos. We thus present AdaPyramid, a framework to reduce as much on-device inference latency as possible, especially for high-resolution videos, while achieving the accuracy demand approximately. We observe that the cheapest configuration to achieve the accuracy demand varies significantly across both different frames and different regions in a frame. The underlying reason is that object features (e.g., the location, size and category of objects) are more uneven in high-resolution videos, both temporally and spatially. Moreover, we observe that the object size presents a prominent hierarchical distribution in high-resolution frames. AdaPyramid thus partitions each frame hierarchically just like a pyramid and chooses a content-aware configuration for each region, which is adapted online based on the feedback. We evaluate the performance of AdaPyramid on a public dataset and our collected real-world videos. The obtained results show that under comparable accuracy to the state-of-the-art solutions, AdaPyramid can decrease inference latency by 40% on average, with up to 2.5× speed-up.
科研通智能强力驱动
Strongly Powered by AbleSci AI