Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

人工智能计算机视觉计算机科学变压器对象（语法）卷积（计算机科学）目标检测模式识别（心理学）物理电压人工神经网络量子力学

作者

Jie Zhou,Degang Yang,Tingting Song,Yichen Ye,Xin Zhang,Yingze Song

出处

期刊：Image and Vision Computing [Elsevier BV]
日期：2024-04-01 卷期号：144: 104966-104966 被引量：2

标识

DOI：10.1016/j.imavis.2024.104966

摘要

Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized by objects that are large near and small far. These problems are still challenges for the existing advanced object detector YOLOv7. Therefore, in this paper, we propose an improved YOLOv7 model. First, Modulated Deformable Convolution is introduced into the YOLOv7 model to automatically adapt to distortion changes of distorted objects in fisheye images. It not only adjusts the sampling position of the convolutional kernel but also further extends the deformation range. The improved model can efficiently extract features of distorted and edge-discontinuous objects. In addition, fisheye images are characterized by objects close to the fisheye lens being large, while objects farther away from the fisheye lens will be smaller. To further optimize the detection performance of small objects in fisheye images, Swin Transformer is also introduced into the YOLOv7 model, and Swin Transformer Block with Window Multi-head Self-Attention (W-MSA) Effectively enhances Network Local Perception. Finally, our proposed model achieves up to 2.4% improvement in mAP compared to the original YOLOv7 model on the ERP-360 dataset. Also, the proposed model achieves the best results compared to other state-of-the-art object detection methods for equirectangular projection images. On the VOC-360 dataset, our proposed model improves the mAP by up to 5.9% compared to the original YOLOv7 model. The experimental results show that the proposed models achieve good results for object detection in both fisheye images and equirectangular projection images. The ERP-360 dataset, source code and pre-trained models for related tasks can be found at https://github.com/xiaoxiaomichong/ERP-360dataset.

求助该文献

最长约 10秒，即可获得该文献文件

Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

今日热心研友