计算机科学
人工智能
一般化
领域(数学分析)
高光谱成像
图像(数学)
模式识别(心理学)
代表(政治)
情态动词
发电机(电路理论)
编码器
上下文图像分类
自编码
自然语言处理
人工神经网络
数学
数学分析
物理
功率(物理)
操作系统
化学
高分子化学
法学
政治
量子力学
政治学
作者
Yuxiang Zhang,Mengmeng Zhang,Wei Li,Ran Tao
标识
DOI:10.1109/icassp49357.2023.10095723
摘要
The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have never focused on mining linguistic modal knowledge to enhance model generalization performance. Additionally, text information has been ignored in hyperspectral image classification (HSI) tasks. To address the aforementioned shortcomings, a Multi-modal Domain Generalization Network (MDG) is proposed to learn cross-domain invariant representation from cross-domain shared semantic space. Only the source domain (SD) is used for training in the proposed method, after which the model is directly transferred to the target domain (TD). Visual and linguistic features are extracted using the dual-stream architecture, which consists of an image encoder and a text encoder. A generator is designed to obtain extended domain (ED) samples that are different from SD. Furthermore, linguistic features are used to construct a cross-domain shared semantic space, where visual-linguistic alignment is accomplished by supervised contrastive learning. Extensive experiments on two datasets show that the proposed method outperforms state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI