计算机科学
联营
串联(数学)
人工智能
特征(语言学)
双线性插值
RGB颜色模型
水准点(测量)
卷积神经网络
模式识别(心理学)
数据挖掘
机器学习
计算机视觉
哲学
组合数学
语言学
数学
地理
大地测量学
作者
Qin Xu,Yiming Mei,Jinpei Liu,Chenglong Li
标识
DOI:10.1109/tmm.2021.3055362
摘要
Hierarchical deep features can provide multilevel abstractions of target objects, which play an important role in target localization and classification. Determining how to effectively aggregate abstract information from different levels in RGB and thermal modalities is the key to exploiting their complementary advantages for robust RGBT tracking. However, existing RGBT tracking algorithms either focus on the semantic information of the last layer or aggregate hierarchical deep features from each modal using simple operations (e.g., summation and concatenation), which limit the capability of the multimodal tracker. To address these issues, in this paper, we propose a novel multimodal cross-layer bilinear pooling network for RGBT tracking. In our network, firstly, to boost the performance of the tracker, we use a channel attention mechanism to implement the adaptive calibration of feature channels for all convolutional layer features before realizing hierarchical feature fusion. Then, a bilinear pooling operation is performed on any two layers through the cross product, which is a second-order computation that effectively aggregates the deep semantic and shallow texture information of the target. Finally, a quality-aware fusion module is designed to aggregate the bilinear pooling features of different layer interactions between different modalities in an adaptive manner. The results of a large number of experiments on two public benchmark datasets demonstrate the effectiveness of our tracker compared with other state-of-the-art tracking methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI