词汇
计算机科学
人工智能
杠杆(统计)
自然语言处理
集合(抽象数据类型)
上下文图像分类
推论
图像(数学)
任务(项目管理)
编码(集合论)
自然语言
机器学习
语言学
程序设计语言
管理
经济
哲学
作者
Sarah I. Pratt,Ian Covert,Rosanne Liu,Ali Farhadi
标识
DOI:10.1109/iccv51070.2023.01438
摘要
Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.
科研通智能强力驱动
Strongly Powered by AbleSci AI