人工智能
图像融合
融合
计算机视觉
模态(人机交互)
计算机科学
模式识别(心理学)
传感器融合
图像(数学)
哲学
语言学
作者
Huafeng Li,Zhijia Yang,Yafei Zhang,Wei Jia,Zhengtao Yu,Yü Liu
标识
DOI:10.1109/tpami.2025.3535617
摘要
In this study, we propose Multimodal Fusion-supervised Cross-modality Alignment Perception (MulFS-CAP), a novel framework for single-stage fusion of unregistered infrared-visible images. Traditional two-stage methods depend on explicit registration algorithms to align source images spatially, often adding complexity. In contrast, MulFS-CAP seamlessly blends implicit registration with fusion, simplifying the process and enhancing suitability for practical applications. MulFS-CAP utilizes a shared shallow feature encoder to merge unregistered infrared-visible images in a single stage. To address the specific requirements of feature-level alignment and fusion, we develop a consistent feature learning approach via a learnable modality dictionary. This dictionary provides complementary information for unimodal features, thereby maintaining consistency between individual and fused multimodal features. As a result, MulFS-CAP effectively reduces the impact of modality variance on cross-modality feature alignment, allowing for simultaneous registration and fusion. Additionally, in MulFS-CAP, we advance a novel cross-modality alignment approach, creating a correlation matrix to detail pixel relationships between source images. This matrix aids in aligning features across infrared and visible images, further refining the fusion process. The above designs make MulFS-CAP more lightweight, effective and explicit registration-free. Experimental results from different datasets demonstrate the effectiveness of our proposed method and its superiority over the state-of-the-art two-stage methods. The source code of our method is available at https://github.com/YR0211/MulFS-CAP.
科研通智能强力驱动
Strongly Powered by AbleSci AI