Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography

基础(证据) 计算机断层摄影术 通才与专种 计算机科学 人工智能 地理 医学 放射科 考古 生物 生态学 栖息地
作者
İbrahim Ethem Hamamcı,Sezgin Er,Furkan Almas,Ayse Gulnihan Simsek,Sevval Nil Esirgün,İrem Doğan,Muhammed Furkan Dasdelen,Omer Faruk Durugol,Bastian Wittmann,Tamaz Amiranashvili,Enis Simsar,Mehmet Simsar,Emine Bensu Erdemir,Abdullah Alanbay,Anjany Sekuboyina,Berkan Lafci,Christian Blüthgen,Mehmet Kemal Özdemir,Bjoern Menze
出处
期刊:Research Square - Research Square 被引量:7
标识
DOI:10.21203/rs.3.rs-5271327/v1
摘要

Abstract While computer vision has achieved tremendous success with multimodal encoding and direct textual interaction with images via chat-based large language models, similar advancements in medical imaging AI—particularly in 3D imaging—have been limited due to the scarcity of comprehensive datasets. To address this critical gap, we introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Through various reconstructions, these scans are expanded to 50,188 volumes, totaling over 14.3 million 2D slices. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in two tasks: multi-abnormality detection and case retrieval. Remarkably, in multi-abnormality detection, CT-CLIP outperforms state-of-the-art fully supervised models across all key metrics, effectively eliminating the need for manual annotation. In case retrieval, it efficiently retrieves relevant cases using either image or textual queries, thereby enhancing knowledge dissemination. By combining CT-CLIP's vision encoder with a pretrained large language model, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT surpasses other multimodal AI assistants, underscoring the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging but also lays the groundwork for future innovations in medical AI and improved patient care.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
科研通AI5应助科科采纳,获得80
刚刚
量子星尘发布了新的文献求助10
刚刚
伯赏人杰完成签到,获得积分10
1秒前
螃螃发布了新的文献求助100
2秒前
wyt完成签到,获得积分10
3秒前
qiuyu完成签到,获得积分10
3秒前
郑小传发布了新的文献求助10
3秒前
冬初发布了新的文献求助10
3秒前
sun发布了新的文献求助10
4秒前
轩辕剑身完成签到,获得积分0
4秒前
舒心的银耳汤完成签到,获得积分10
5秒前
wanli445完成签到,获得积分10
6秒前
Mia完成签到,获得积分10
6秒前
香蕉觅云应助旺德福采纳,获得10
7秒前
8秒前
8秒前
深情安青应助lala采纳,获得10
8秒前
SciGPT应助靓丽的沁采纳,获得10
9秒前
9秒前
彭于晏应助悦耳的城采纳,获得10
10秒前
蘑菇腿完成签到,获得积分10
10秒前
名丿完成签到,获得积分10
10秒前
12秒前
zqw完成签到 ,获得积分10
12秒前
CJ发布了新的文献求助30
13秒前
畅快奎发布了新的文献求助10
13秒前
14秒前
14秒前
15秒前
蘑菇腿发布了新的文献求助10
15秒前
keyanzhang发布了新的文献求助10
15秒前
15秒前
16秒前
黄浩完成签到,获得积分10
17秒前
余凉发布了新的文献求助30
17秒前
陈林完成签到,获得积分10
17秒前
金雪发布了新的文献求助10
18秒前
TYKI发布了新的文献求助10
19秒前
20秒前
高分求助中
(禁止应助)【重要!!请各位详细阅读】【科研通的精品贴汇总】 10000
Diagnostic Imaging: Pediatric Neuroradiology 2000
Semantics for Latin: An Introduction 1099
Biology of the Indian Stingless Bee: Tetragonula iridipennis Smith 1000
Robot-supported joining of reinforcement textiles with one-sided sewing heads 740
镇江南郊八公洞林区鸟类生态位研究 500
Corpus Linguistics for Language Learning Research 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4138620
求助须知:如何正确求助?哪些是违规求助? 3675481
关于积分的说明 11618519
捐赠科研通 3369721
什么是DOI,文献DOI怎么找? 1851056
邀请新用户注册赠送积分活动 914272
科研通“疑难数据库(出版商)”最低求助积分说明 829162