计算机科学
人工智能
人工神经网络
语义学(计算机科学)
图形
语义相似性
特征学习
深度学习
匹配(统计)
二元分类
水准点(测量)
相似性(几何)
学习排名
代表(政治)
卷积神经网络
因果关系(物理学)
模态(人机交互)
语义匹配
秩(图论)
数据挖掘
可靠性
自然语言处理
二进制数
机器学习
二元关系
训练集
循环神经网络
标识
DOI:10.1109/tmm.2026.3651028
摘要
Multimedia data have rich semantic knowledge, and cross-modal retrieval (CMR) methods are able to explore their correlations. Graph neural networks (GNN) can represent complex connection information, so some CMR methods apply GNNs as semantic comprehender to improve matching accuracy. However, fine-grained classifiers can accurately obtain object-centric semantics, but these semantics may be conflicting, potentially leading to inexplicability responses that are difficult to ground, for example. Meanwhile, it may be concerned that the credibility of GNN, mainly includes sensitivity to out-of-distribution changes and lack of interpretability. Therefore, we attempt to integrate causal learning into GNNs and capture potential causal relationships rather than surface object-centric classification. Firstly, we analyze semantic causality and build cross-modal structure causal model, then achieve cross-modal interventional-causal learning by causality-inspired graph neural network (CIGNN). Secondly, we propose modality contrastive learning to characterize the intra-modal and inter-modal correlations, and project into the common representation space. Thirdly, a new soft rank loss method is designed beyond binary similarity to achieve fine-grained similarity sorting. Comprehensive experiments on three widely used benchmark datasets prove the superiority of our proposed method, while ablation experiments demonstrated the effectiveness of each component.
科研通智能强力驱动
Strongly Powered by AbleSci AI