自编码
人工智能
图像融合
特征学习
代表(政治)
模态(人机交互)
融合
模式识别(心理学)
计算机科学
冗余(工程)
生成语法
图像(数学)
特征(语言学)
计算机视觉
生成模型
融合机制
光学(聚焦)
钥匙(锁)
特征向量
机器学习
特征提取
源代码
深度学习
模式
图像质量
上下文图像分类
图像复原
棱锥(几何)
作者
Jingwei Xin,Boneng Shi,Nannan Wang,Jie Li,Xinbo Gao
标识
DOI:10.1109/tip.2025.3615680
摘要
Creating a comprehensively representative image while maintaining the merits of various modalities is a key focus of current Multi-Modality Image Fusion research. Existing unified methods often struggle to handle varying types of degradation while extracting modality-shared and modality-specific information from source images, leading to limitations in their generative or representation capabilities under different conditions. To address the challenge, we propose MVFusion, a novel self-supervised masked variational autoencoder framework that simultaneously enhances generative training and representation learning. It is designed to cope with varying image quality and dataset composition with a unified framework while ensuring effective fusion of modality information. Specifically, MVFusion employs a self-supervised masked autoencoder to reduce the impact of redundancy and degradation in the source images, and thus learns the latent distribution of degraded input images in the generative training stage. In addition, we incorporate variational feature learning to further preserve the distinctive modality features in the representation learning stage. Extensive experiments demonstrate that our model achieves promising results in several classical fusion tasks, including infrared-visible, multi-focus, multi-exposure, and medical image fusion. The code is available at https://github.com/shiboneng/MVFusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI