计算机科学
编码器
人工智能
变压器
稳健性(进化)
情态动词
RGB颜色模型
模式识别(心理学)
计算机视觉
BitTorrent跟踪器
眼动
电压
量子力学
基因
操作系统
物理
生物化学
化学
高分子化学
作者
Yujue Cai,Xiubao Sui,Guohua Gu,Qian Chen
标识
DOI:10.1016/j.infrared.2023.104819
摘要
RGB-T tracking can be seen as multi-view fusion tracking, and in this study, we propose a network with transformer structure, Multi-Modal Mutual Propagation Tracker (MMMPT). In order to obtain robust appearance model from multi-modal data, we adopt encoder–decoder architecture for extract information. In the encoding stage, the template features of multiple frames enhance the common features across them through the self-attention mechanism to obtain time-invariant target representation. At the same time, it also interacts with multi-modal data through cross-modal propagation, resulting in a modal-invariant representation of the target. The transformer decoder transfers useful information from the template to search areas through a similarity matrix. We experiment on the RGBT234, GTOT, VTUAV and LasHeR datasets to assess the RGBT-transformer tracker. Extensive experiments indicate that our proposed framework is not inferior to the state-of-the-art trackers in terms of robustness and accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI