计算机科学
杠杆(统计)
相互信息
Boosting(机器学习)
人工智能
情态动词
变压器
模式
机器学习
特征学习
计算机视觉
模式识别(心理学)
工程类
社会学
电气工程
电压
化学
高分子化学
社会科学
作者
Zhengtao Wu,Lingbo Liu,Yang Zhang,Mingzhi Mao,Liang Lin,Guanbin Li
标识
DOI:10.1109/icme52920.2022.9859777
摘要
Crowd counting is a fundamental yet challenging task that aims to automatically estimate the number of people in crowded scenes. Nowadays, with the rapid development of thermal and depth sensors, thermal images and depth maps become more accessible, which are proven to be beneficial information in boosting the performance of crowd counting. Consequently, we propose a Mutual Attention Transformer (MAT) module to fully leverage the complementary information of different modalities. Specifically, our MAT employs a cross-modal mutual attention mechanism to utilize the features of one modality to enhance the features of the other. Moreover, to improve performance by learning better visual representation and further exploiting modality-wise comple-mentarity, we design a self-supervised pre-training method based on cross-modal image reconstruction. Extensive experiments on two standard benchmarks (i.e., RGBT-CC and ShanghaiTechRGBD) show that the proposed method is effective and universal for multimodal crowd counting, outper-forming previous state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI