计算机科学
青光眼
人工智能
计算机视觉
机器视觉
模式识别(心理学)
医学
眼科
作者
Caisheng Liao,Yuki Todo,Zheng Tang
标识
DOI:10.1117/1.jei.34.2.023016
摘要
Traditional deep learning models, such as convolutional neural networks (CNNs), have played a crucial role in assisting the early diagnosis of glaucoma. However, these models face limitations in capturing the long-range dependencies required for effective glaucoma detection. We propose a hybrid model that combines the Vision Transformer (ViT) with a self-supervised volume contrast (VoCo) learning framework. The ViT is used to extract global contextual features, whereas VoCo focuses on capturing generalized fine-grained representations. After extracting these multi-level features, the model employs a weighted fusion mechanism to select and integrate the most relevant features, ensuring robustness in the diagnostic process. The model was validated using the JustRAIGS 2024 dataset. The proposed ViT+VoCo model demonstrates exceptional performance on key metrics: sensitivity of 0.8355, specificity of 0.9594, accuracy of 0.9556, and area under the curve (AUC) of 0.9711. Specifically, the accuracy and AUC improved by 1.55% and 1.45%, respectively, compared with the baseline ViT, and it outperformed classic CNNs and state-of-the-art systems on most metrics. Despite certain challenges, we validate the effectiveness of integrating global features with fine-grained generalized representations for early glaucoma detection. The results underscore the potential of this hybrid approach and highlight its promise in clinical applications. Future work could explore multimodal learning, domain adaptation, and model interpretability to further enhance the model's clinical applicability and impact.
科研通智能强力驱动
Strongly Powered by AbleSci AI