自编码
计算机科学
情态动词
模态(人机交互)
模式
人工智能
机器学习
生成语法
一致性(知识库)
图像(数学)
编码(集合论)
模式识别(心理学)
深度学习
集合(抽象数据类型)
程序设计语言
社会学
化学
高分子化学
社会科学
作者
Bing Cao,Haifang Cao,Jiaxu Liu,Pengfei Zhu,Changqing Zhang,Qinghua Hu
标识
DOI:10.1109/tmm.2023.3274990
摘要
Multi-modal images are required in a wide range of practical scenarios, from clinical diagnosis to public security. However, certain modalities may be incomplete or unavailable because of the restricted imaging conditions, which commonly leads to decision bias in many real-world applications. Despite the significant advancement of existing image synthesis techniques, learning complementary information from multi-modal inputs remains challenging. To address this problem, we propose an autoencoder-based collaborative attention generative adversarial network (ACA-GAN) that uses available multi-modal images to generate the missing ones. The collaborative attention mechanism deploys a single-modal attention module and a multi-modal attention module to effectively extract complementary information from multiple available modalities. Considering the significant modal gap, we further developed an autoencoder network to extract the self-representation of target modality, guiding the generative model to fuse target-specific information from multiple modalities. This considerably improves cross-modal consistency with the desired modality, thereby greatly enhancing the image synthesis performance. Quantitative and qualitative comparisons for various multi-modal image synthesis tasks highlight the superiority of our approach over several prior methods by demonstrating more precise and realistic results.
科研通智能强力驱动
Strongly Powered by AbleSci AI