量化(信号处理)
计算机科学
边缘设备
算法
软件部署
计算机工程
实时计算
人工智能
云计算
操作系统
作者
Xinyu Liu,Tao Wang,Jiaming Yang,Chenwei Tang,Jiancheng Lv
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2024-03-01
卷期号:574: 127210-127210
被引量:2
标识
DOI:10.1016/j.neucom.2023.127210
摘要
You Only Look Once (YOLO), known for its real-time performance and outstanding accuracy, has emerged as a prominent framework for object detection tasks. However, deploying YOLO on resource-constrained edge devices poses challenges due to its substantial memory requirements. In this paper, we propose MPQ-YOLO, an ultra-low mixed-precision quantization framework designed for edge device deployment. The core idea is to integrate 1-bit Backbone quantization and 4-bit Head quantization with dedicated training techniques. Specifically, we analyze the effect of numerical distribution on the performance of binary neural networks (BNNs), and based on this, we design a backbone with only 1-bit convolution. Then, we introduce a trainable scale and Progressive Network Quantization (PNQ) training strategy to bridge the Backbone and Head for end-to-end quantization training. The former is applied to both weights and activations within the 4-bit Head, enabling effective gradient propagation. The latter mitigates oscillation caused by mixed precision training, promoting smoother training and faster model convergence. Extensive experiments on VOC and COCO datasets demonstrate that MPQ-YOLO achieves a good trade-off between model compression and detection performance. Specifically, compared to the full-precision model, MPQ-YOLO achieves compression of up to 16.3× and 14.2× in terms of computational complexity and model size, respectively, while maintaining relatively high detection accuracy, i.e., 74.7% on VOC and 51.5% on COCO. To the best of our knowledge, MPQ-YOLO is the first YOLO framework with dual low mixed-precision quantization. Moreover, compared to the existing layer-wise mixed-precision quantization methods which cause redundant data processing and massive data movement, MPQ-YOLO offers a more hardware-designer-friendly and straightforward solution through efficient resource utilization and reuse.
科研通智能强力驱动
Strongly Powered by AbleSci AI