作者
Lin Jiang,Yixuan Shen,Mei Da,Jue Hu,Zhijian Zhang
摘要
Abstract Infrared imaging technology captures the thermal radiation emitted by targets to form images, enabling the filtration of redundant information in complex road scenes and thus facilitating pedestrian and vehicle monitoring. However, the existing infrared target detection models suffer from inadequate accuracy, prone to false detections and missed detections in complex scenarios such as nighttime and adverse weather conditions, posing threats to traffic safety and intelligent driving. Moreover, these models typically have a large number of parameters and rely on high-performance GPUs, which increases hardware costs and restricts their deployment. Additionally, their slow detection speed makes it difficult to meet real-time requirements. In response to the aforementioned issues, this paper proposes a lightweight infrared small target detection algorithm: GML-YOLO. Firstly, we designed a lightweight backbone network, ghost-hierarchical geometry network, to improve feature extraction efficiency, enabling accurate and real-time feature extraction. Secondly, we incorporated adaptive downsampling and attention mechanisms in the network fusion part, replacing the simple concatenation used in traditional detectors. This design effectively integrates shallow and deep information. In addition, we have also designed the cross stage partial-mixed local channel attention module. This module innovatively reworks the original C2f module by integrating a hybrid attention mechanism, effectively enhancing the detection performance of the model. Subsequently, the WIOUv3 loss function is employed to accelerate the model’s convergence speed and reduce the loss, thereby enhancing the detection accuracy of the model. Finally, we conducted comparative experiments on our infrared scene target detection (ISTD) as well as the publicly available FLIR and pascal VOC datasets. The results demonstrate that GML-YOLO achieves a high mean average precision of 89.7% on our ISTD dataset, 86.5% on the FLIR dataset, and 79.7% on the pascal VOC dataset. Moreover, the computational cost and the number of parameters are reduced by 20% and 27%, respectively. The improved algorithm, GML-YOLO, outperforms YOLOv3, YOLOv5, YOLOv6, YOLOv8s, and YOLOv8n, thereby validating the feasibility of the proposed algorithm in this paper.