计算机科学
人工智能
语义学(计算机科学)
图像融合
降级(电信)
自然语言处理
计算机视觉
融合
复合数
图像(数学)
模式识别(心理学)
程序设计语言
算法
语言学
电信
哲学
作者
Hao Zhang,Lei Cao,Xuhui Zuo,Zhenfeng Shao,Jiayi Ma
标识
DOI:10.1109/tpami.2025.3568433
摘要
Existing image fusion methods struggle to accommodate composite degradation and do not support users flexibly modulating the semantic objects of interest. To address these challenges, this study proposes a composite degradation-robust image fusion framework with language-driven semantics, called OmniFuse. Firstly, OmniFuse establishes a novel multi-modal information fusion paradigm based on the latent diffusion model (LDM). By projecting the information fusion function into the latent space of the LDM, the information fusion process is seamlessly integrated with the diffusion process. Thus, OmniFuse fully leverages the powerful generative capabilities of LDM to eliminate composite degradation, thereby achieving highly robust image fusion. Secondly, OmniFuse develops a language-driven controllable fusion strategy to strengthen fusion flexibility. It employs a language-driven feature fusion module (LFFM) to receive the specified localization priori, dynamically aggregating multi-modal features. Within LFFM, a visual enhancement regularization is introduced to highlight objects of interest for capturing perceptual attention, while reverse semantic driving is established to strengthen their semantic attributes. Together, the visual and semantic constraints can implicitly correct the imperfect localization priori, further refining the accuracy of language-driven control. Extensive experiments demonstrate the omnipotent performance of OmniFuse, with significant advantages in robustness and flexibility compared to state-of-the-art methods. The code is publicly available at https://github.com/HaoZhang1018/OmniFuse.
科研通智能强力驱动
Strongly Powered by AbleSci AI