Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection

计算机科学计算机视觉人工智能目标检测对象（语法）图像处理模式识别（心理学）算法图像（数学）

作者

Yulin He,Wei Chen,Siqi Wang,Tianrui Liu,Meng Wang

出处

期刊：IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
日期：2024-09-18 卷期号：34: 729-742 被引量：10

链接

nih.govdoi.org

标识

DOI：10.1109/tip.2024.3459589

摘要

Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.

求助该文献

最长约 10秒，即可获得该文献文件

Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection

今日热心研友