计算机科学
情态动词
代表(政治)
人工智能
鉴定(生物学)
融合
模式识别(心理学)
自然语言处理
计算机视觉
机器学习
语言学
高分子化学
政治
政治学
生物
哲学
法学
化学
植物
作者
G. F. Cao,Qing Tang,Xuan-Thuy Vo,Adri Priadana,Tien-Dat Tran,Kang-Hyun Jo
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2025-01-01
卷期号:13: 114911-114922
被引量:1
标识
DOI:10.1109/access.2025.3584377
摘要
Person Re-identification (Re-ID) aims at accurately querying pedestrians across multiple non-overlapping cameras system, playing an essential role in computer vision applications. While CNN-based methods leverage feature aggregation and attention mechanisms to achieve competitive performance, their limitations in capturing long-range dependencies have motivated the research of transformer-based architectures, which excel in the evolution of person Re-ID research. Despite leveraging pretrained vision-language models to facilitate visual encoder training, current methods suffer from fragmented optimization procedures, complexity increasing, and inadequate semantic guidance from textual prompts. To address these limitations, we propose an end-to-end method named Text-Guided Fusion Transformer (TGFT). TGFT applies the fixed but semantically enriched texts to guide visual training with a novel Gated Cross Attention (GCA) fusion module. The GCA module adaptively modulates the integration of textual information and visual features, effectively enhancing cross-modal feature alignment and semantic consistency. Without significantly increasing computational complexity or parameter count, TGFT achieves superior performance, demonstrating strong effectiveness and scalability on multiple major Re-ID benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI