计算机科学
人工智能
背景(考古学)
水准点(测量)
模式
判别式
自然语言处理
一致性(知识库)
任务(项目管理)
机器学习
代表(政治)
相似性(几何)
图形
过程(计算)
理论计算机科学
图像(数学)
地理
法学
社会学
管理
经济
古生物学
大地测量学
操作系统
政治
生物
社会科学
政治学
作者
Pengfei Luo,Tong Xu,Shiwei Wu,Chen Zhu,Linli Xu,Enhong Chen
标识
DOI:10.1145/3580305.3599439
摘要
Multimodal entity linking (MEL) task, which aims at resolving ambiguous mentions to a multimodal knowledge graph, has attracted wide attention in recent years. Though large efforts have been made to explore the complementary effect among multiple modalities, however, they may fail to fully absorb the comprehensive expression of abbreviated textual context and implicit visual indication. Even worse, the inevitable noisy data may cause inconsistency of different modalities during the learning process, which severely degenerates the performance. To address the above issues, in this paper, we propose a novel Multi-GraIned Multimodal InteraCtion Network (MIMIC) framework for solving the MEL task. Specifically, the unified inputs of mentions and entities are first encoded by textual/visual encoders separately, to extract global descriptive features and local detailed features. Then, to derive the similarity matching score for each mention-entity pair, we device three interaction units to comprehensively explore the intra-modal interaction and inter-modal fusion among features of entities and mentions. In particular, three modules, namely the Text-based Global-Local interaction Unit (TGLU), Vision-based DuaL interaction Unit (VDLU) and Cross-Modal Fusion-based interaction Unit (CMFU) are designed to capture and integrate the fine-grained representation lying in abbreviated text and implicit visual cues. Afterwards, we introduce a unit-consistency objective function via contrastive learning to avoid inconsistency and model degradation. Experimental results on three public benchmark datasets demonstrate that our solution outperforms various state-of-the-art baselines, and ablation studies verify the effectiveness of designed modules.
科研通智能强力驱动
Strongly Powered by AbleSci AI