计算机科学
解析
人工智能
任务(项目管理)
模式
集合(抽象数据类型)
情态动词
机器学习
自然语言
图像(数学)
自然语言处理
模式识别(心理学)
社会科学
化学
管理
社会学
高分子化学
经济
程序设计语言
作者
Tianyu Liu,Chao Zhu,Yang Liu
标识
DOI:10.1109/icpr56361.2022.9956569
摘要
Cross-modal text-based person search aims at retrieving target person in a large image gallery by natural language description. This task is quite challenging due to the complex environment of person image acquisition and the semantic gap between different modalities. The existing popular deep learning based models rely heavily on a large amount of labeled data to obtain good performance, which is labor-consuming and not always available in real applications. In order to achieve an effective alignment within and between modalities, additional semantic information or pre-trained network is often introduced to assist human body positioning, which will further shape the increase in training parameters and the decrease in efficiency. To address these problems, we propose a single-stage Identity-guided image-text Attribute Parsing and Alignment network (IAPA). IAPA realizes cross-modal alignment of human body parts in an unsupervised manner through image itself, resulting in great efficiency improvement while maintaining promising accuracy. This is also the first attempt to apply pixel-level supervision to cross-modal person retrieval task. The experiments on the CUHK-PEDES data-set validate the effectiveness and the efficiency of IAPA compared to the other state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI