匹配(统计)
相似性(几何)
计算机科学
人工智能
对抗制
图像(数学)
样品(材料)
鉴定(生物学)
模式识别(心理学)
认知
相似性度量
机器学习
数学
统计
心理学
化学
植物
色谱法
神经科学
生物
作者
Shichen Huang,Weina Fu,Zhaoyue Zhang,Shuai Liu
标识
DOI:10.1016/j.inffus.2023.102084
摘要
In the increasingly popular era of adversarial machine learning (AML), developing more robust and generalized algorithms has become a key research topic. Image-text matching as the foundation of tasks such as video Q&A and text-image generation also faces various attacks in AML. Current image-text matching based on the similarity of matching fragments only focuses on the local matching results, which does not establish a comprehensive cognition of content in text and image. Therefore, mismatching in the abstract scene appears when facing complex attacks. At the same time, existing methods are not sensitive enough to identify the internal relationship between objects in different local areas, which also confuse matching. Therefore, aiming at the above problems, a global similarity matching module is proposed. Based on global cognition, a global similarity matching is established, which is dynamically fused with local similarity to measure the matching results flexibly and improve the understanding of abstract scenes. At the same time, a global-local cognition fusion training mechanism based on relationship adversarial sample generation is proposed. Enhance understanding of internal relationships between objects in different local area through adversarial sample generation. Global loss was introduced to train the overall model, and adjusting the proportion of global-local in the training process through loss adjustment to better identified the relationships between objects in different local areas and avoided confusion and matching caused by the similarity of matching objects. The experimental results show that our method on the Flickr30K dataset is 7.4% (rSum) better than the current best method, and on the MS-COCO dataset is 4.0% (rSum using the 1K test set) better than the current best method. The proposed global-local fusion (GLF) based on adversarial sample generation for image-text matching algorithm improves the accuracy and robustness of image-text matching performs well in facing some security challenges. At the same time, promotes the development of visual and linguistic modal fusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI