RGBX-DiffusionDet: a framework for multi-modal RGB-X object detection using DiffusionDet

计算机科学 人工智能 判别式 目标检测 特征(语言学) 模式识别(心理学) 计算机视觉 特征提取 传感器融合 块(置换群论) 卷积神经网络 降维 特征学习 模块化设计 频道(广播) 解码方法 RGB颜色模型 嵌入 正规化(语言学) 最小边界框 对象(语法) 分割 特征向量 特征模型 视觉对象识别的认知神经科学 图像融合
作者
Eliraz Orfaig,Inna Stainvas,Igal Bilik
出处
期刊:Pattern Recognition [Elsevier BV]
卷期号:172: 112460-112460 被引量:3
标识
DOI:10.1016/j.patcog.2025.112460
摘要

• Development of RGBX-DiffusionDet, a modular and extensible framework that demonstrates the feasibility of integrating auxiliary 2D data into DiffusionDet. • Introduction of DCR-CBAM, a dynamic feature fusion approach. • Introduction of DMLAB, a dynamic feature aggregation operation, designed to enhance the performance of the diffusion decoding process. • Novel regularization losses that enforce channel saliency and spatial selectivity, enabling compact and discriminative feature embeddings. • The first use of pixel-aligned RGB-P data for object detection, including the generation of bounding box annotations, to motivate future research in multi-modal data processing. This work addresses the challenge of object detection using multimodal heterogeneous sensors by extending the recently proposed DiffusionDet framework, initially designed for RGB-only detection. We propose RGBX-DiffusionDet, a generalized diffusion-based object detection framework that enables seamless fusion of heterogeneous 2D modalities (denoted as “X”, e.g., depth, infrared, and polarimetric data) with RGB imagery. The proposed approach adopts a mid-level feature fusion strategy to address the heterogeneous nature of multimodal data, characterized by varying spatial resolutions, noise profiles, and semantic content. Instead of commonly used brute-force feature concatenation, we introduce two novel architectural components: (1) a dynamic channel reduction convolutional block attention module (DCR-CBAM), which enhances cross-modal fusion by emphasizing salient channel features while reducing the dimensionality of merged RGB-X features, and (2) a dynamic multi-level aggregation block (DMLAB), which addresses a limitation of the baseline DiffusionDet decoder by adaptively fusing spatial features to improve object localization. Additionally, we incorporate novel regularization losses that promote channel saliency and spatial selectivity, resulting in compact and discriminative feature embeddings. Extensive experiments on RGB-depth (KITTI), a newly annotated RGB-polarimetric (RGB-P) dataset, and RGB-infrared (M3FD) benchmarks demonstrate consistent superiority of the proposed approach over RGB-only baselines, while maintaining decoding efficiency. We further show that RGBX-DiffusionDet exhibits improved robustness and generalization capability in visually-corrupted conditions, demonstrating its practical efficiency for robust multimodal object detection.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
黒絔发布了新的文献求助10
刚刚
yxrose完成签到,获得积分10
1秒前
粗犷的谷冬关注了科研通微信公众号
2秒前
zhangxasq完成签到,获得积分10
2秒前
汉堡包应助WFZ采纳,获得10
2秒前
舒合完成签到 ,获得积分10
2秒前
xuexue321完成签到,获得积分10
2秒前
李健的小迷弟应助chens627采纳,获得30
2秒前
yexing完成签到,获得积分10
2秒前
顺利凌旋发布了新的文献求助30
2秒前
徐继军完成签到 ,获得积分10
3秒前
俗人完成签到 ,获得积分10
4秒前
飞儿完成签到,获得积分10
4秒前
giao完成签到,获得积分10
4秒前
5秒前
scc完成签到,获得积分10
7秒前
冷静的夏彤完成签到,获得积分10
7秒前
在九月完成签到 ,获得积分10
7秒前
天青色等烟雨完成签到 ,获得积分10
7秒前
7秒前
香飘飘完成签到,获得积分10
9秒前
7io1in完成签到 ,获得积分10
10秒前
一周八颗蛋完成签到,获得积分10
10秒前
屿鑫完成签到,获得积分10
10秒前
笑点低的蜻蜓完成签到,获得积分10
10秒前
cdercder应助自觉石头采纳,获得10
11秒前
11秒前
不三不四完成签到,获得积分10
12秒前
坚强的安柏完成签到,获得积分10
12秒前
wu完成签到 ,获得积分10
12秒前
深情的鞯发布了新的文献求助10
13秒前
烂漫明轩完成签到,获得积分10
13秒前
13秒前
灰灰完成签到,获得积分10
13秒前
内向迎蕾完成签到,获得积分10
14秒前
淇与完成签到,获得积分20
14秒前
嘻嘻完成签到 ,获得积分10
14秒前
艽野完成签到,获得积分10
15秒前
善良梦竹完成签到 ,获得积分10
15秒前
範範完成签到,获得积分10
15秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Prompt Engineering for Clinicians: Harnessing AI in Everyday Medical Practice 600
University Physics for the Life Sciences 500
REAL-WORLD EFFICACY AND GENOMIC LANDSCAPE OF POLATUZUMA VEDOTIN-BASED FIRST-LINE THERAPY IN DIFFUSE LARGE B-CELL LYMPHOMA: A FOCUS ON TP53 MUTATIONS AND TREATMENT RESPONSE 500
Handbook of Luminescence Dating 500
Safety Pharmacology 500
《KNN基无铅压电陶瓷电学性能优化与物理机理研究》 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6951107
求助须知:如何正确求助?哪些是违规求助? 8635409
关于积分的说明 18309814
捐赠科研通 6393194
什么是DOI,文献DOI怎么找? 3081978
关于科研通互助平台的介绍 2127005
邀请新用户注册赠送积分活动 2058866