MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment

量化（信号处理）计算机科学边缘设备算法软件部署计算机工程实时计算人工智能云计算操作系统

作者

Xinyu Liu,Tao Wang,Jiaming Yang,Chenwei Tang,Jiancheng Lv

出处

期刊：Neurocomputing [Elsevier BV]
日期：2023-12-30 卷期号：574: 127210-127210 被引量：8

标识

DOI：10.1016/j.neucom.2023.127210

摘要

You Only Look Once (YOLO), known for its real-time performance and outstanding accuracy, has emerged as a prominent framework for object detection tasks. However, deploying YOLO on resource-constrained edge devices poses challenges due to its substantial memory requirements. In this paper, we propose MPQ-YOLO, an ultra-low mixed-precision quantization framework designed for edge device deployment. The core idea is to integrate 1-bit Backbone quantization and 4-bit Head quantization with dedicated training techniques. Specifically, we analyze the effect of numerical distribution on the performance of binary neural networks (BNNs), and based on this, we design a backbone with only 1-bit convolution. Then, we introduce a trainable scale and Progressive Network Quantization (PNQ) training strategy to bridge the Backbone and Head for end-to-end quantization training. The former is applied to both weights and activations within the 4-bit Head, enabling effective gradient propagation. The latter mitigates oscillation caused by mixed precision training, promoting smoother training and faster model convergence. Extensive experiments on VOC and COCO datasets demonstrate that MPQ-YOLO achieves a good trade-off between model compression and detection performance. Specifically, compared to the full-precision model, MPQ-YOLO achieves compression of up to 16.3× and 14.2× in terms of computational complexity and model size, respectively, while maintaining relatively high detection accuracy, i.e., 74.7% on VOC and 51.5% on COCO. To the best of our knowledge, MPQ-YOLO is the first YOLO framework with dual low mixed-precision quantization. Moreover, compared to the existing layer-wise mixed-precision quantization methods which cause redundant data processing and massive data movement, MPQ-YOLO offers a more hardware-designer-friendly and straightforward solution through efficient resource utilization and reuse.

求助该文献

最长约 10秒，即可获得该文献文件

MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment

今日热心研友