计算机科学
目标检测
遥感
词汇
计算机视觉
人工智能
对象(语法)
模式识别(心理学)
地理
语言学
哲学
作者
Jianlin Xie,Guanqun Wang,Tong Zhang,Sun Yikang,He Chen,Yin Zhuang,Jun Li
标识
DOI:10.1109/tgrs.2025.3564332
摘要
Object detection is a crucial task in computer vision for remote sensing applications. However, the reliance of traditional methods on predefined and trained object categories limits their applicability in open-world scenarios. A key challenge in open-vocabulary object detection lies in accurately identifying unseen objects. Existing approaches often focus solely on detecting object locations, struggling to recognize the categories of previously unseen targets. To address this issue, we propose a novel benchmark where models are trained on known base classes and evaluated on their performance in detecting and recognizing unseen or novel classes. To this end, we introduce llama-Unidetector, a universal framework that incorporates textual information into a closed-set detector, enabling the generalization to open-set scenarios. Our llama-Unidetector leverages a decoupled learning strategy that separates localization and recognition. In the first stage, a class-agnostic detector identifies objects, distinguishing only between foreground and background. In the second stage, the detected foreground objects are passed through TerraOV-LLM, a multimodal large language model, for recognition, utilizing the strong generalization capabilities of large language models to infer the correct categories. We propose a self-built Vision Question Answering (VQA) remote sensing dataset, TerraVQA, and conduct extensive experiments on the NWPU-VHR10, DOTA1.0, and DIOR datasets. The llama-Unidetector achieves impressive results, with a performance of 75.46% AP, 50.22% AP and 51.38% AP on the zero-shot detection benchmarks for the NWPU-VHR10, DOTA1.0 and DIOR datasets, respectively. Our source code is available at: https://github.com/ChloeeGrace/LLaMA-Unidetector.
科研通智能强力驱动
Strongly Powered by AbleSci AI