计算机科学
人工智能
水准点(测量)
情态动词
特征(语言学)
关系(数据库)
匹配(统计)
关系数据库
光学(聚焦)
图形
特征学习
自然语言处理
模式识别(心理学)
情报检索
数据挖掘
理论计算机科学
语言学
化学
哲学
统计
数学
物理
大地测量学
高分子化学
光学
地理
作者
Junfeng Zhou,Baigang Huang,Wenjiao Fan,Ziqian Cheng,Zhuoyi Zhao,Weifeng Zhang
标识
DOI:10.1016/j.knosys.2023.110253
摘要
The core difficulty of text-based person search is how to achieve fine-grained alignment of visual and linguistic modal data, so as to bridge the gap of modal heterogeneity. Most existing works on this task focus on global and local features extraction and matching, ignoring the importance of relational information. This paper proposes a new text-based person search model, named CM-LRGNet, which extracts Cross-Modal Local-Relational-Global features in an end-to-end manner, and performs fine-grained cross-modal alignment on the above three feature levels. Concretely, we first split the convolutional feature maps to obtain local features of images, and adaptively extract textual local features. Then a relation encoding module is proposed to implicitly learn the relational information implied in the images and texts. Finally, a relation-aware graph attention network is designed to fuse the local and relational features to generate global representations for both images and text queries. Extensive experimental results on benchmark dataset (CUHK-PEDES) show that our approach can achieve state-of-the-art performance (64.18%, 82.97%, 89.85% in terms of Top-1, Top-5, and Top-10 accuracies), by learning and aligning local-relational-global representations from different modalities. Our code has been released in https://github.com/zhangweifeng1218/Text-based-Person-Search.
科研通智能强力驱动
Strongly Powered by AbleSci AI