基础(证据)
计算机科学
图像(数学)
人工智能
地理
考古
作者
Yuxuan Sun,Yixuan Si,Chenglu Zhu,Xuan Gong,Kai Zhang,Pingyi Chen,Ye Zhang,Zhongyi Shui,Lin Tao,Lin Yang
出处
期刊:Cornell University - arXiv
日期:2024-12-16
标识
DOI:10.48550/arxiv.2412.12077
摘要
The emergence of large multimodal models (LMMs) has brought significant advancements to pathology. Previous research has primarily focused on separately training patch-level and whole-slide image (WSI)-level models, limiting the integration of learned knowledge across patches and WSIs, and resulting in redundant models. In this work, we introduce CPath-Omni, the first 15-billion-parameter LMM designed to unify both patch and WSI level image analysis, consolidating a variety of tasks at both levels, including classification, visual question answering, captioning, and visual referring prompting. Extensive experiments demonstrate that CPath-Omni achieves state-of-the-art (SOTA) performance across seven diverse tasks on 39 out of 42 datasets, outperforming or matching task-specific models trained for individual tasks. Additionally, we develop a specialized pathology CLIP-based visual processor for CPath-Omni, CPath-CLIP, which, for the first time, integrates different vision models and incorporates a large language model as a text encoder to build a more powerful CLIP model, which achieves SOTA performance on nine zero-shot and four few-shot datasets. Our findings highlight CPath-Omni's ability to unify diverse pathology tasks, demonstrating its potential to streamline and advance the field of foundation model in pathology.
科研通智能强力驱动
Strongly Powered by AbleSci AI