Global-local fusion based on adversarial sample generation for image-text matching

匹配（统计）相似性（几何）计算机科学人工智能对抗制图像（数学）样品（材料）鉴定（生物学）模式识别（心理学）认知相似性度量机器学习数学统计心理学化学植物色谱法神经科学生物

作者

Shichen Huang,Weina Fu,Zhaoyue Zhang,Shuai Liu

出处

期刊：Information Fusion [Elsevier]
日期：2024-03-01 卷期号：103: 102084-102084 被引量：3

标识

DOI：10.1016/j.inffus.2023.102084

摘要

In the increasingly popular era of adversarial machine learning (AML), developing more robust and generalized algorithms has become a key research topic. Image-text matching as the foundation of tasks such as video Q&A and text-image generation also faces various attacks in AML. Current image-text matching based on the similarity of matching fragments only focuses on the local matching results, which does not establish a comprehensive cognition of content in text and image. Therefore, mismatching in the abstract scene appears when facing complex attacks. At the same time, existing methods are not sensitive enough to identify the internal relationship between objects in different local areas, which also confuse matching. Therefore, aiming at the above problems, a global similarity matching module is proposed. Based on global cognition, a global similarity matching is established, which is dynamically fused with local similarity to measure the matching results flexibly and improve the understanding of abstract scenes. At the same time, a global-local cognition fusion training mechanism based on relationship adversarial sample generation is proposed. Enhance understanding of internal relationships between objects in different local area through adversarial sample generation. Global loss was introduced to train the overall model, and adjusting the proportion of global-local in the training process through loss adjustment to better identified the relationships between objects in different local areas and avoided confusion and matching caused by the similarity of matching objects. The experimental results show that our method on the Flickr30K dataset is 7.4% (rSum) better than the current best method, and on the MS-COCO dataset is 4.0% (rSum using the 1K test set) better than the current best method. The proposed global-local fusion (GLF) based on adversarial sample generation for image-text matching algorithm improves the accuracy and robustness of image-text matching performs well in facing some security challenges. At the same time, promotes the development of visual and linguistic modal fusion.

求助该文献

最长约 10秒，即可获得该文献文件

Global-local fusion based on adversarial sample generation for image-text matching

今日热心研友