过度拟合
人工智能
计算机科学
探测器
匹配(统计)
模式识别(心理学)
特征(语言学)
最佳显著性理论
计算机视觉
特征提取
图像(数学)
数学
人工神经网络
心理治疗师
语言学
哲学
心理学
电信
统计
作者
M. Hu,Bin Sun,F. Zhang,Shutao Li
标识
DOI:10.1109/tip.2025.3568310
摘要
Image matching is a critical task in computer vision research, focusing on aligning two or more images with similar features. Feature detection and description constitute the core of image matching. Handcrafted detectors are capable of obtaining distinctive points but these points may not be repeatable on the image pairs especially those with dramatic appearance changes. On the contrary, the learned detectors can extract a large number of repeatable points but many of them tend to be ambiguous points with low distinctiveness. Moreover, in the scenarios of dramatic appearance change, commonly used contrast or triplet loss in the training of descriptors employ the hard negative mining strategy, which may obtain overly challenging negative samples by global sampling, resulting in sluggish convergence or even overfitting. Those learned descriptors may not guarantee that the corresponding points enjoy larger similarities than unmatched ones, leading to inaccurate matches. To address those issues, we propose a hierarchically learned detector and descriptor (HLDD) for robust image matching, which contains three modules: a handcrafted-learned detector, a hierarchically learned descriptor, and a coarse-to-fine matching strategy. The handcrafted-learned detector integrates the advantages of handcrafted and learned detectors. It extracts distinctive feature points from a learned repeatability map robust to image changes and eliminates the ambiguous ones according to a learned distinctiveness map. The descriptor is trained by a proposed hierarchical triplet loss, which employs a dual window strategy. It can obtain the hardest negative samples in local windows, which are comparatively easier over global sampling, ensuring the effective training of descriptors. The coarse-to-fine matching strategy performs global and local mutual nearest neighbor matching on the coarse and fine descriptor maps respectively to improve the matching accuracy progressively. By comparing with other matching methods, experimental results demonstrate the superiority of the proposed method in the task of image matching, homography estimation, visual localization, and relative pose estimation. Moreover, ablation studies illustrate the effectiveness of the three proposed modules.
科研通智能强力驱动
Strongly Powered by AbleSci AI