Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection

人工智能 计算机科学 计算机视觉 RGB颜色模型 棱锥(几何) 变压器 特征(语言学) 模式识别(心理学) 情态动词 目标检测 特征提取 模式 工程类 数学 哲学 社会学 电气工程 语言学 电压 化学 高分子化学 社会科学 几何学
作者
Yaohui Zhu,Xiaoyu Sun,Miao Wang,Hua Huang
出处
期刊:IEEE Transactions on Intelligent Transportation Systems [Institute of Electrical and Electronics Engineers]
卷期号:24 (9): 9984-9995 被引量:81
标识
DOI:10.1109/tits.2023.3266487
摘要

RGB-Infrared multi-modal object detection utilizes diverse and complementary information, showing some advantages in intelligent transportation field. The main challenge of RGB-Infrared object detection is how to fuse the two modalities. The difficulty of fusion is reflected in two aspects: 1) large visual differences between modalities make it difficult to learn effective complementary features, 2) some misaligned RGB-Infrared images increase the difficulty of fusion. To this end, based on feature pyramid commonly used in object detection, we propose Multi-modal Feature Pyramid Transformer (MFPT) to fuse the two modalities. The proposed MFPT learns semantic and modal complementary information to enhance each modal features via intra-modal feature pyramid transformer and inter-modal feature pyramid transformer. The intra-modal feature pyramid transformer enables features to interact across space and scales, improving the semantic representations of features in each modality. The inter-modal feature pyramid transformer conducts feature interaction between modalities, enabling each modality to learn complementary features from other modalities. Meanwhile, the inter-modal feature pyramid transformer can also learn distance independent dependencies between modalities, which are not sensitive to misaligned images. Furthermore, a local attention mechanism is introduced within different windows into MFPT to achieve efficient correlation between regions of different scales or different modalities. Experimental results on two RGB-Infrared detection datasets demonstrate the proposed method is superior to state-of-the-art methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
chenhui完成签到,获得积分10
刚刚
乐乐应助涂涂采纳,获得10
刚刚
NEET完成签到,获得积分10
1秒前
Akim应助fxx采纳,获得10
1秒前
1秒前
王其超发布了新的文献求助20
1秒前
2秒前
上官若男应助高高的外套采纳,获得10
3秒前
平淡的晓山完成签到,获得积分10
3秒前
万能图书馆应助蓝天采纳,获得10
7秒前
自由元冬发布了新的文献求助10
7秒前
Nott发布了新的文献求助50
8秒前
KINGAZX完成签到 ,获得积分10
10秒前
小马甲应助过气的蓝精灵采纳,获得10
11秒前
11秒前
李健应助浮山采纳,获得10
11秒前
12秒前
贪财好丞完成签到,获得积分10
13秒前
13秒前
小二郎应助杰尼龟的鱼采纳,获得10
15秒前
15秒前
雾暮灬发布了新的文献求助10
16秒前
屿溡完成签到,获得积分10
16秒前
16秒前
自由元冬完成签到,获得积分10
16秒前
无花果应助PP采纳,获得10
17秒前
自由的星星完成签到,获得积分10
18秒前
炙热秋翠发布了新的文献求助10
18秒前
王宇琦完成签到 ,获得积分10
19秒前
20秒前
22秒前
薛萌发布了新的文献求助10
23秒前
朴素千亦完成签到,获得积分10
23秒前
23秒前
tip完成签到,获得积分10
23秒前
Jasper应助许诺采纳,获得10
24秒前
weitao0916完成签到,获得积分10
24秒前
雪白开山发布了新的文献求助10
24秒前
24秒前
延胡索完成签到,获得积分10
25秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Picture this! Including first nations fiction picture books in school library collections 2000
The Cambridge History of China: Volume 4, Sui and T'ang China, 589–906 AD, Part Two 1500
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
ON THE THEORY OF BIRATIONAL BLOWING-UP 666
Signals, Systems, and Signal Processing 610
“美军军官队伍建设研究”系列(全册) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6387600
求助须知:如何正确求助?哪些是违规求助? 8201433
关于积分的说明 17351999
捐赠科研通 5441240
什么是DOI,文献DOI怎么找? 2877476
邀请新用户注册赠送积分活动 1853783
关于科研通互助平台的介绍 1697590