Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection

计算机科学 计算机视觉 人工智能 目标检测 对象(语法) 图像处理 模式识别(心理学) 算法 图像(数学)
作者
Yulin He,Wei Chen,Siqi Wang,Tianrui Liu,Meng Wang
出处
期刊:IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
卷期号:34: 729-742 被引量:16
标识
DOI:10.1109/tip.2024.3459589
摘要

Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
坦率的马里奥完成签到 ,获得积分10
5秒前
xuesong发布了新的文献求助10
7秒前
huahua完成签到,获得积分10
10秒前
Yjweei完成签到,获得积分10
13秒前
科研通AI2S应助huahua采纳,获得10
14秒前
377发布了新的文献求助10
14秒前
cyh完成签到 ,获得积分10
15秒前
ky幻影完成签到,获得积分10
17秒前
17秒前
fengliurencai完成签到,获得积分10
17秒前
18秒前
Lucas应助ml3029采纳,获得10
18秒前
苹果荆完成签到,获得积分10
19秒前
W_G完成签到,获得积分10
20秒前
21秒前
arniu2008应助Adax采纳,获得20
24秒前
yun发布了新的文献求助10
25秒前
luchangan发布了新的文献求助10
26秒前
hyphen完成签到,获得积分10
27秒前
清脆的道天完成签到,获得积分20
28秒前
29秒前
qiu发布了新的文献求助10
33秒前
直率无春完成签到,获得积分10
33秒前
34秒前
科研通AI6.3应助377采纳,获得10
34秒前
段皖顺完成签到 ,获得积分10
35秒前
共享精神应助iorpi采纳,获得10
36秒前
37秒前
38秒前
LUOSHEN发布了新的文献求助10
40秒前
41秒前
宋浩奇完成签到 ,获得积分10
41秒前
42秒前
小资发布了新的文献求助10
43秒前
Chip发布了新的文献求助10
46秒前
luchangan发布了新的文献求助10
47秒前
54秒前
科研通AI2S应助Rosie采纳,获得15
55秒前
酷波er应助郭竞阳采纳,获得10
55秒前
科研通AI6.3应助heiehi采纳,获得10
55秒前
高分求助中
液晶指向矢仿真分析数据集 8888
Invited Discussant 63O and 64O 1000
Dr. Dirk Wiechmann on Lingual Orthodontics: Part I 888
Ideology and Meaning-Making under the Putin Regime 750
化工技术经济第五版电子版 500
Petrology and Plate Tectonics 500
Writing Systems 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6880018
求助须知:如何正确求助?哪些是违规求助? 8579863
关于积分的说明 18229469
捐赠科研通 6262633
什么是DOI,文献DOI怎么找? 3054881
关于科研通互助平台的介绍 2064893
邀请新用户注册赠送积分活动 2032579