PCFFusion: Progressive cross-modal feature fusion network for infrared and visible images

特征（语言学）人工智能红外线的特征提取情态动词融合保险丝（电气）模式识别（心理学）计算机科学计算机视觉图像融合比例（比率）代表（政治）特征检测（计算机视觉）特征向量突出自编码融合规则财产（哲学）骨干网加权

作者

Shuying Huang,Kai Zhang,Yong Yang,Weiguo Wan

出处

期刊：Pattern Recognition [Elsevier BV]
日期：2025-09-08 卷期号：172: 112419-112419 被引量：2

标识

DOI：10.1016/j.patcog.2025.112419

摘要

• A novel deep learning-based method PCFFusion is proposed for infrared and visible image fusion. • An FDM is designed to extract different scale features of infrared and visible images, which can improve the network’s feature representation ability. • A Transformer-based CMFEM is constructed to achieve correction and enhancement of the two modal features at different scales. • A CDFM is constructed to achieve fusion of features at different scales through interactive compensation of two modal features. • Extensive experiments verify that the proposed network outperforms SOTA networks. Infrared and visible image fusion (IVIF) aims to fuse thermal target information in infrared images and spatial texture information in visible images, improving the observability and comprehensibility of the fused images. Currently, most IVIF methods suffer from the loss of salient target information and texture details in fused images. To alleviate this problem, a progressive cross-modal feature fusion network (PCFFusion) for IVIF is proposed, which comprises two stages: feature extraction and feature fusion. In the feature extraction stage, to enhance the network’s feature representation capability, a feature decomposition module (FDM) is constructed to extract two modal features of different scales by defining a feature decomposition operation (FDO). In addition, by establishing correlations between the high- frequency and low-frequency components of two modal features, a cross-modal feature enhancement module (CMFEM) is built to realize correction and enhancement of the two features at each scale. The feature fusion stage achieves the fusion of two modal features at each scale and the supplementation of adjacent scale features by constructing three cross-domain fusion module (CDFMs). To constrain the fused results preserve more salient targets and richer texture details, a dual-feature fidelity loss function is defined by constructing a salient weight map to balance the two loss terms. Extensive experiments demonstrate that fusion results of the proposed method highlight prominent targets from infrared images while retaining rich background details from visible images, and the performance of PCFFusion is superior to some advanced methods. Specifically, compared to the optimal results obtained by other comparison methods, the proposed network achieves an average increase of 30.35 % and 10.9 % in metrics Mutual Information (MI) and Standard deviation (SD) on the TNO dataset, respectively.

求助该文献

最长约 10秒，即可获得该文献文件

PCFFusion: Progressive cross-modal feature fusion network for infrared and visible images

今日热心研友