遥感
计算机科学
比例(比率)
计算机视觉
人工智能
地质学
地图学
地理
作者
Haitao Yin,Zhuyun Zhu,He Wang
标识
DOI:10.1109/tgrs.2025.3571033
摘要
Detection Transformer (DETR) has emerged as a highly promising approach in object detection and has attracted significant interest. However, most DETR-like methods cannot simultaneously leverage the shape and scale priors for attention calculation, resulting in limited performance in detecting remote sensing objects with diverse shapes and scales. To address this issue, this article proposes a Scale-Enhanced Deformable DETR (SED-DETR) for remote sensing object detection (RSOD). The core component of SED-DETR is Scale-Enhanced Deformable Attention (SEDA), which is designed based on the principles of deformable shape and dynamic scale. Specifically, the SEDA module utilizes multi-scale attention heads. First, conventional multiple attention heads are consolidated into several scale-heads through an adaptive scale aggregation approach, which dynamically adjusts the distributions of different scales to enhance the scale-aware modeling ability. For each scale-head, dilated sampling is applied at a specific dilation rate to capture multi-scale receptive fields. The sampled positions are further refined by learnable offsets predicted from query features, enabling a deformable dilated mechanism for fine-grained feature extraction of multi-scale instances. Finally, we adopt the mixed query selection and the denoising training defined in DINO to implement SED-DETR. Experimental results on the xView, DIOR, NWPU VHR-10 and COCO datasets demonstrate that SED-DETR outperforms state-of-the-art DETR-like methods. Specifically, SED-DETR achieves 5.6%, 10.9%, and 8.6% mAP gains over the baseline Deformable DETR on the xView, DIOR, and NWPU VHR-10 datasets, respectively. The source code is available at https://github.com/zzy599/SEDDETR.
科研通智能强力驱动
Strongly Powered by AbleSci AI