计算机科学
人工智能
模式识别(心理学)
特征学习
判别式
特征(语言学)
图形
语义特征
突出
特征提取
语义匹配
卷积神经网络
匹配(统计)
自然语言处理
数学
理论计算机科学
哲学
语言学
统计
作者
Wenxin Tan,Hua Ji,Qian Liu,Ming Jin
标识
DOI:10.1109/icarce55724.2022.10046452
摘要
Image-text matching has received increasing attention because it enables the interaction between vision and language. Existing approaches have two limitations. First, most existing methods only pay attention to learning paired samples, ignoring the similar semantic information in the same modality. Second, the current methods lack interaction between local and global features, resulting in the mismatch of certain image regions or words due to the lack of global information. To solve the above problems, we propose a new dual semantic graph similarity learning (DSGSL) network, which consists of a feature enhancement module for learning compact features and a feature alignment module that learns the relations between global and local features. In the feature enhancement module, similar samples are processed as a graph, and a graph convolutional network is used to extract similar features to reconstruct the global feature representation. In addition, we use a gated fusion network to obtain discriminative sample representations by selecting salient features from other modalities and filtering out insignificant information. In the feature alignment module, we construct a dual semantic graph for every sample to learn the association between local features and global features. Numerous experiments on MS-COCO and Flicr30K have shown that our approach reaches the most advanced performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI