基础(证据)
建筑
人工智能
特征(语言学)
计算机科学
混合(物理)
计算机视觉
图像(数学)
模式识别(心理学)
地理
物理
艺术
视觉艺术
哲学
量子力学
考古
语言学
作者
Gaoshuang Huang,Yang Zhou,Xiaofei Hu,Chenglong Zhang,Luying Zhao,Wenjian Gan,Mingbo Hou
摘要
Obtaining the geographical location of images through image geo-localization technology is a highly significant task. However, existing image geo-localization methods struggle with accuracy under difficult conditions such as viewpoint changes, illumination variations, seasonal changes, and occlusions. To address these challenges, we proposed an image geo-localization architecture based on a foundation vision model and feature mixing. The architecture involves truncating and fine-tuning the foundation vision model DINOv2 to extract robust image features, which are then aggregated using an MLP-Mixer-based mix module to obtain robust and generalized image global features. This architecture significantly improves the accuracy of image geo-localization under difficult conditions. Experimental results demonstrate that the proposed architecture outperforms state-of-the-art (SOTA) methods in image geo-localization accuracy. Compared to SOTA methods, our architecture achieves accuracy improvements of 6.35%, 4.06%, and 6.30% on the test sets Tokyo 24/7, Nordland, and SF-XL-testv1, respectively, with viewpoint changes, illumination changes, season changes, and occlusions.
科研通智能强力驱动
Strongly Powered by AbleSci AI