计算机科学
构造(python库)
任务(项目管理)
人工智能
注释
序列(生物学)
自然语言处理
对象(语法)
命名实体识别
情报检索
源代码
光学(聚焦)
编码(集合论)
模式识别(心理学)
集合(抽象数据类型)
程序设计语言
物理
管理
生物
光学
经济
遗传学
作者
Jieming Wang,Ziyan Li,Jianfei Yu,Yang Li,Rui Xia
标识
DOI:10.1145/3581783.3612322
摘要
Multimodal Named Entity Recognition (MNER) aims to locate and classify named entities mentioned in a pair of text and image. However, most previous MNER works focus on extracting entities in the form of text but failing to ground text symbols to their corresponding visual objects. Moreover, existing MNER studies primarily classify entities into four coarse-grained entity types, which are often insufficient to map them to their real-world referents. To solve these limitations, we introduce a task named Fine-grained Multimodal Named Entity Recognition and Grounding (FMNERG) in this paper, which aims to simultaneously extract named entities in text, their fine-grained entity types, and their grounded visual objects in image. Moreover, we construct a Twitter dataset for the FMNERG task, and further propose a T5-based multImodal GEneration fRamework (TIGER), which formulates FMNERG as a generation problem by converting all the entity-type-object triples into a target sequence and adapts a pre-trained sequence-to-sequence model T5 to directly generate the target sequence from an image-text input pair. Experimental results demonstrate that TIGER performs significantly better than a number of baseline systems on the annotated Twitter dataset. Our dataset annotation and source code are publicly released at https://github.com/NUSTM/FMNERG.
科研通智能强力驱动
Strongly Powered by AbleSci AI