红外线的
计算机科学
融合
图像融合
图像(数学)
人工智能
计算机视觉
光学
物理
语言学
哲学
作者
Jinyuan Liu,X. Rong Li,Zirui Wang,Zhiying Jiang,Wei Zhong,Wei Fan,Bin Xu
标识
DOI:10.1109/jas.2024.124878
摘要
The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI