计算机科学
人工智能
计算机视觉
特征提取
目标检测
卷积神经网络
变压器
块(置换群论)
模式识别(心理学)
工程类
几何学
数学
电压
电气工程
作者
Ye Tao,Wenyang Qin,Zongyang Zhao,Xiaozhi Gao,Xiangpeng Deng,Yu Ouyang
出处
期刊:IEEE Transactions on Instrumentation and Measurement
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:72: 1-13
被引量:8
标识
DOI:10.1109/tim.2023.3241825
摘要
Unmanned aerial vehicles (UAVs) play an important role in conducting automatic patrol inspections of cities, which can ensure the safety of urban residents’ life and property and the normal operation of cities. However, during the inspection process, problems may arise. For example, numerous small objects in UAV images are difficult to detect, objects in UAV images are severely occluded, and requirements for real-time performances are posed. To address these issues, we first propose a real-time object detection network (RTD-Net) for UAV images. Besides, to deal with the lack of visual features of small objects, we design a feature fusion module (FFM) to interact and fuse features at different levels and improve the feature expression ability of small objects. To achieve real-time detection, we design a lightweight feature extraction module (LEM) to build the backbone network to control the calculation quantity and parameters. To solve the issue of discontinuous features of occluded objects, an efficient convolutional transformer block (ECTB)-based convolutional multihead self-attention (CMHSA) is designed to improve the recognition ability of occluded objects by extracting the context information of objects. Compared with multihead self-attention (MHSA) in the traditional transformer, CMHSA uses convolutional projection to replace the position-linear projection, which can reduce a large amount of calculation without performance loss. Finally, an attention prediction head (APH) is designed based on the attention mechanism to improve the ability of the model to extract attention regions in complex scenarios. The proposed method reaches a detection accuracy of 86.4% mean average precision (mAP) in our UAV image dataset. In addition, it achieves a detection accuracy of 86.0% mAP and a detection speed of 33.4 frames/s in the NVIDIA Jeston TX2 embedded device.
科研通智能强力驱动
Strongly Powered by AbleSci AI