计算机科学
变压器
RGB颜色模型
利用
图形
人工智能
卷积神经网络
数据挖掘
机器学习
理论计算机科学
量子力学
计算机安全
电压
物理
作者
Yi Pan,Wujie Zhou,Meixin Fang,Fangfang Qiang
标识
DOI:10.1109/lgrs.2024.3362820
摘要
Crowd counting has received significant attention in recent years due to its practical applications. In order to address the specific characteristics of RGB and thermal images, we have developed the graph enhancement and transformer aggregation network (GETANet) for generating representative density maps. Our approach incorporates several innovative modules to enhance accuracy. Firstly, we introduced a position-adaptive module that effectively counts individuals' positions and integrates features extracted from the main framework. Furthermore, we leveraged the advantages of graph convolutional networks (GCNs), which integrate spatial information and exploit relationships between nodes. Specifically, we designed a dual GCN module that further improves the model's performance by considering the spatial context and relationships among individuals in the crowd. To capture global image information and improve overall performance, we integrated a vision transformer into our model architecture. The vision transformer effectively captures global dependencies and enhances the model's ability to understand complex crowd scenes. Additionally, we designed a transformer information aggregation module that integrates information from multiple levels, resulting in a highly precise prediction map. Through comprehensive experiments on benchmark datasets such as RGBT-CC and DroneRGBT, our GETANet demonstrated its effectiveness in RGB-thermal crowd counting tasks. Moreover, GETANet showcased remarkable generalization results on the ShanghaiTech-RGBD dataset. Our code has been made publicly available on GitHub at https://github.com/panyi95/GETANet.
科研通智能强力驱动
Strongly Powered by AbleSci AI