情态动词
计算机科学
图像(数学)
计算机视觉
材料科学
高分子化学
作者
Ling Chen,Xingjian Han,Siyuan Lin,Huafeng Mai,Huaijin Ran
标识
DOI:10.1109/bibm62325.2024.10822809
摘要
The advent of multi-modal large language models (MLLMs) has ushered in a paradigm shift in clinical diagnostics and therapeutic approaches through advanced medical image interpretation. Despite this progress, the majority of extant investigations have focused primarily on two-dimensional medical imagery, overlooking the potential of volumetric data with its inherently richer spatial information. Our research endeavors to push the boundaries of three-dimensional medical image analysis through the novel application of MLLMs. To this end, we present MedTriVision, a meticulously curated dataset designed for a diverse array of volumetric medical tasks, encompassing image-text retrieval, report generation, visual question answering, spatial localization, and anatomical segmentation. Additionally, we introduce TriMedLM, an innovative multi-faceted multi-modal large language model specifically engineered for volumetric medical image analysis. To facilitate rigorous evaluation, we have developed TriMedLM-Bench, a pioneering three-dimensional multimodal medical assessment framework that enables automated performance appraisal across eight distinct tasks. Extensive empirical investigations demonstrate that our proposed methodology represents a robust and versatile paradigm for three-dimensional medical image analysis, consistently outperforming contemporary approaches in both efficacy and adaptability.
科研通智能强力驱动
Strongly Powered by AbleSci AI