A lightweight real-time salient object detection algorithm-hardware co-optimization method for STM32 platforms
作者
Qing Zhu,Jiahui Lv,Qinqin Xia
标识
DOI:10.1117/12.3087101
摘要
We propose CSNet-ED (Embedded Deployable CSNet), a lightweight salient object detection architecture tailored for real-time processing on resource-constrained embedded systems. Built upon the original CSNet framework, our method adopts a co-design approach between algorithm and hardware to balance model complexity and performance. First, we optimize the backbone by adjusting the hyperparameters, reducing the allocation of high-resolution channels while preserving sufficient feature representation. Then, we replace standard convolutions with depthwise separable convolutions (DSConv)—where depthwise convolutions independently extract spatial features from each input channel, and pointwise (1×1) convolutions efficiently fuse inter-channel information. Furthermore, through module-level redundancy analysis, we prune non-critical layers and retain only the core feature extraction blocks, configuring the four stages with module counts of (3, 4, 4, 4). This structural optimization reduces FLOPs by 65.7% and parameters by 74.9%, while maintaining detection accuracy (F-measure of 0.862 and MAE of 0.102). At the deployment stage, we employ 8-bit integer quantization (PTSQ) and dynamic range calibration, which compresses the model with minimal accuracy loss. Although experiments are conducted on the STM32H743 platform, the design adheres to platform-agnostic principles: (1) the model is exported in standard ONNX intermediate representation, supporting various embedded processors (e.g., ARM, RISC-V); (2) the use of hardware-friendly operations such as DSConv and INT8 quantization aligns with mainstream edge-AI deployment pipelines. Real-world testing on the STM32H743 demonstrates real-time performance at 45 FPS, validating the framework’s suitability for practical deployment in edge computing scenarios.