Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformer

计算机科学人工智能计算机视觉特征提取目标检测卷积神经网络变压器块（置换群论）模式识别（心理学）工程类几何学数学电气工程电压

作者

Tao Ye,Wenyang Qin,Zongyang Zhao,Xiaozhi Gao,Xiangpeng Deng,Yu Ouyang

出处

期刊：IEEE Transactions on Instrumentation and Measurement [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：72: 1-13 被引量：145

标识

DOI：10.1109/tim.2023.3241825

摘要

Unmanned aerial vehicles (UAVs) play an important role in conducting automatic patrol inspections of cities, which can ensure the safety of urban residents’ life and property and the normal operation of cities. However, during the inspection process, problems may arise. For example, numerous small objects in UAV images are difficult to detect, objects in UAV images are severely occluded, and requirements for real-time performances are posed. To address these issues, we first propose a real-time object detection network (RTD-Net) for UAV images. Besides, to deal with the lack of visual features of small objects, we design a feature fusion module (FFM) to interact and fuse features at different levels and improve the feature expression ability of small objects. To achieve real-time detection, we design a lightweight feature extraction module (LEM) to build the backbone network to control the calculation quantity and parameters. To solve the issue of discontinuous features of occluded objects, an efficient convolutional transformer block (ECTB)-based convolutional multihead self-attention (CMHSA) is designed to improve the recognition ability of occluded objects by extracting the context information of objects. Compared with multihead self-attention (MHSA) in the traditional transformer, CMHSA uses convolutional projection to replace the position-linear projection, which can reduce a large amount of calculation without performance loss. Finally, an attention prediction head (APH) is designed based on the attention mechanism to improve the ability of the model to extract attention regions in complex scenarios. The proposed method reaches a detection accuracy of 86.4% mean average precision (mAP) in our UAV image dataset. In addition, it achieves a detection accuracy of 86.0% mAP and a detection speed of 33.4 frames/s in the NVIDIA Jeston TX2 embedded device.

求助该文献

最长约 10秒，即可获得该文献文件

Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformer

今日热心研友