基线(sea)
遥感
水准点(测量)
红外线的
计算机科学
航空影像
人工智能
环境科学
计算机视觉
地质学
图像(数学)
光学
大地测量学
物理
海洋学
作者
Zhinan Gao,Dongdong Li,Yangliu Kuai,Rui Chen,Gongjian Wen
标识
DOI:10.1109/tgrs.2025.3528634
摘要
With the extensive use of multisensors in uncrewed aerial vehicles (UAVs), multimodality information processing has become the research focus. In academic research pertaining to object detection and tracking tasks in UAVs, researchers often align visible-infrared image pairs as a preprocessing step. However, in actual tasks, the dual-modality image pair acquired by UAVs is unaligned, which significantly limits the application of downstream tasks. At present, there are no publicly available multimodality image alignment datasets for UAVs. In this article, we present a large-scale benchmark for the dual-modality image alignment task in UAVs, including 81000 training image pairs and 15000 testing image pairs. Meanwhile, we propose a transformer-based dual-modality image alignment network as the baseline for this benchmark. First, the algorithm extracts multiscale features for image representation to address unaligned image pairs with varying resolutions. Second, a transformer-based alignment network is proposed to improve the fusion of features from heterogeneous modalities. Finally, deformable attention is adopted to alleviate the problem of memory explosion. Numerous experiments on this dual-modality image alignment benchmark are conducted to demonstrate the effectiveness of our algorithm. Source codes are available at https://github.com/gaozhinanjiu/UAVmatch.
科研通智能强力驱动
Strongly Powered by AbleSci AI