隐藏字幕
计算机科学
序列(生物学)
任务(项目管理)
情态动词
模态(人机交互)
集合(抽象数据类型)
人工智能
编码(集合论)
图像(数学)
模式
简单(哲学)
自然语言处理
机器学习
程序设计语言
经济
社会学
化学
管理
高分子化学
哲学
认识论
生物
遗传学
社会科学
作者
Peng Wang,Yang An,Rui Men,Junyang Lin,Shuai Bai,Zhikang Li,Jianxin Ma,Chang Zhou,Jingren Zhou,Hongxia Yang
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:163
标识
DOI:10.48550/arxiv.2202.03052
摘要
In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely large cross-modal datasets, OFA is pretrained on only 20M publicly available image-text pairs. Despite its simplicity and relatively small-scale training data, OFA achieves new SOTAs in a series of cross-modal tasks while attaining highly competitive performances on uni-modal tasks. Our further analysis indicates that OFA can also effectively transfer to unseen tasks and unseen domains. Our code and models are publicly available at https://github.com/OFA-Sys/OFA.
科研通智能强力驱动
Strongly Powered by AbleSci AI