计算机科学
人工智能
图像(数学)
自然语言处理
病理
计算机视觉
认知科学
医学
心理学
作者
Dawei Dai,Yuanhui Zhang,Qianlan Yang,Long Xu,Xiaojing Shen,Shuyin Xia,Guoyin Wang
标识
DOI:10.1007/s10462-025-11190-1
摘要
Abstract The previous advancements in pathology image understanding primarily involved developing models tailored to specific tasks. Recent studies have demonstrated that the large vision-language model can enhance the performance of various downstream tasks in medical image understanding. In this study, we developed a domain-specific large vision-language model (PathologyVLM) for pathology image understanding. Specifically, (1) we first construct a human pathology image-text dataset by cleaning the public medical image-text data for domain-specific alignment; (2) Using the proposed image-text data, we first train a pathology language-image pretraining (PLIP) model as the specialized visual encoder to extract the features of pathology image, and then we developed scale-invariant connector to avoid the information loss caused by image scaling; (3) We adopt two-stage learning to train PathologyVLM, first stage for domain alignment, and second stage for end to end visual question & answering (VQA) task. In experiments, we evaluate our PathologyVLM on both supervised and zero-shot VQA datasets, our model achieved the best overall performance among multimodal models of similar scale. The ablation experiments also confirmed the effectiveness of our design. We posit that our PathologyVLM model and the datasets presented in this work can promote research in field of computational pathology. All codes are available at: https://github.com/ddw2AIGROUP2CQUPT/PA-LLaVA
科研通智能强力驱动
Strongly Powered by AbleSci AI