图像融合
融合
计算机科学
人工智能
分割
计算机视觉
任务(项目管理)
编码器
可控性
语义学(计算机科学)
特征(语言学)
传感器融合
光学(聚焦)
图像分割
图像(数学)
适应性
模式识别(心理学)
过程(计算)
感知
特征提取
任务分析
作者
Yiming Sun,Yuan Ruan,Qinghua Hu,Pengfei Zhu
出处
期刊:Cornell University - arXiv
日期:2026-01-12
摘要
Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities, enhancing environmental awareness for intelligent unmanned systems. Existing methods either focus on pixel-level fusion while overlooking downstream task adaptability or implicitly learn rigid semantics through cascaded detection/segmentation models, unable to interactively address diverse semantic target perception needs. We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts. The model integrates a multi-modal feature extractor, a reference prompt encoder (RPE), and a prompt-semantic fusion module (PSFM). The RPE dynamically encodes task-specific semantic prompts by fine-tuning pre-trained segmentation models with input mask guidance, while the PSFM explicitly injects these semantics into fusion features. Through synergistic optimization of parallel segmentation and fusion branches, our method achieves mutual enhancement between task performance and fusion quality. Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.
科研通智能强力驱动
Strongly Powered by AbleSci AI