航空影像
仿射变换
人工智能
计算机科学
转化(遗传学)
基础(证据)
计算机视觉
遥感
地质学
地理
图像(数学)
数学
考古
化学
纯数学
基因
生物化学
作者
Wenhui Diao,Haichen Yu,Kootak Kang,Tong Ling,Di Liu,Yingchao Feng,Hanbo Bi,Libo Ren,Xuexue Li,Yongqiang Mao,Xian Sun
标识
DOI:10.1109/tpami.2025.3602237
摘要
Aerial Remote Sensing (ARS) vision tasks present significant challenges due to the unique viewing angle characteristics. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes RingMo-Aerial, aiming to fill the gap in foundation model research in the field of ARS vision. A Frequency-Enhanced Multi-Head Self-Attention (FE-MSA) mechanism is introduced to strengthen the model's capacity for small-object representation. Complementarily, an affine transformation-based contrastive learning method improves its adaptability to the tilted viewing angles inherent in ARS tasks. Furthermore, the ARS-Adapter, an efficient parameter fine-tuning method, is proposed to improve the model's adaptability and performance in various ARS vision tasks. Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks. This indicates the practicality and efficacy of RingMo-Aerial in enhancing the performance of ARS vision tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI