模式
计算机科学
人工智能
水准点(测量)
桥接(联网)
基础(证据)
光学(聚焦)
机器学习
情态动词
光学
物理
社会学
历史
考古
化学
社会科学
高分子化学
地理
计算机网络
大地测量学
作者
Danli Shi,Weiyi Zhang,J. Yang,Siyu Huang,Xiaolan Chen,Pusheng Xu,Kai Jin,Lin Shan,Wei Jin,Mayinuer Yusufu,Shunming Liu,Q Zhang,Zongyuan Ge,Xun Xu,Mingguang He
标识
DOI:10.1038/s41746-025-01772-2
摘要
Early detection of eye diseases is vital for preventing vision loss. Existing ophthalmic artificial intelligence models focus on single modalities, overlooking multi-view information and struggling with rare diseases due to long-tail distributions. We propose EyeCLIP, a multimodal visual-language foundation model trained on 2.77 million ophthalmology images from 11 modalities with partial clinical text. Our novel pretraining strategy combines self-supervised reconstruction, multimodal image contrastive learning, and image-text contrastive learning to capture shared representations across modalities. EyeCLIP demonstrates robust performance across 14 benchmark datasets, excelling in disease classification, visual question answering, and cross-modal retrieval. It also exhibits strong few-shot and zero-shot capabilities, enabling accurate predictions in real-world, long-tail scenarios. EyeCLIP offers significant potential for detecting both ocular and systemic diseases, and bridging gaps in real-world clinical applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI