计算机科学
情态动词
人工智能
计算机视觉
化学
高分子化学
作者
Wenfeng Zhang,Baoning Cai,Jianming Hu,Qibing Qin,Kezhen Xie
标识
DOI:10.1109/lsp.2024.3379005
摘要
The radiology report generation task generates diagnostic descriptions from radiology images, aiming to alleviate the onerous task for radiologists and alerting them to abnormalities. However, the data bias problem poses a persistent challenge, since the abnormal regions usually occupy a small portion of radiology image, while the report generation process should pay greater attention to the abnormal regions. Moreover, the data volume is relatively small compared to large language models, posing challenges during training. To address these issues effectively, we propose a Visual-textual Cross-model Interaction Network (VCIN) to enhance the quality of generated reports. VCIN comprises two key modules: Abundant Clinical Information Embedding (ACIE), which gathers rich cross-modal interaction information to promote the report generation of abnormal regions; and a Bert-based Decoder-only Generator (BDG), built on Bert architecture to mitigate training difficulties. The superior performance of our proposed model is demonstrated through experimental results obtained from two public benchmark datasets. The code is available at https://github.com/QinLab-WFU/VCIN .
科研通智能强力驱动
Strongly Powered by AbleSci AI