计算机科学
超声波
人工智能
计算机视觉
模式识别(心理学)
放射科
医学
作者
Yan Li,Xiaodong Zhou,Y.P. Wang,Xuan Chang,Qing Li,Gang Han
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2025-01-01
卷期号:13: 107950-107960
标识
DOI:10.1109/access.2025.3578462
摘要
As a crucial non-invasive imaging modality in clinical diagnosis, ultrasound interpretation faces challenges of subjectivity and inefficiency. To address the limitations of traditional single-modal deep learning models in cross-modal alignment and structured text generation, this study proposes an intelligent analysis system based on a CLIP-GPT joint framework, integrating contrastive learning with generative pre-training for end-to-end image classification and diagnostic report generation. Utilizing an ultrasound dataset containing six types of liver lesions, we implement a multi-stage training strategy: first establishing visual-semantic cross-modal mapping through CLIP (ViT-B/32), followed by fine-tuning GPT-2 and GPT-3.5 to construct a structured report generator. Experimental results demonstrate that the proposed system achieves superior performance: classification accuracy reaches 96.4%, recall 95.1%, and F1-score 95.5%, significantly outperforming conventional CNN models (e.g., ResNet-50 with accuracy 89.6%). For report generation, the fine-tuned GPT-2 model achieves a BLEU-4 score of 32.5 and ROUGE-L score of 41.2, indicating strong alignment with clinical reporting standards. Key innovations include: a cross-modal feature decoupling-recombination mechanism bridging semantic gaps, clinical guideline-driven hierarchical templates ensuring professional standardization, and dynamic attention strategies enhancing lesion discrimination. This study provides an interpretable multimodal solution for medical image analysis, offering significant clinical value for intelligent diagnostic systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI