计算机科学
计算机视觉
人工智能
计算机图形学(图像)
作者
Yan Yang,X. H. You,Ke Zhang,Zhenqi Fu,Xianyun Wang,Jiajun Ding,Jiamei Sun,Yu Zhou,Qingming Huang,Weidong Han,Jun Yu
标识
DOI:10.1109/tmi.2025.3554498
摘要
Chest X-ray report generation has attracted increasing research attention. However, most existing methods neglect the temporal information and typically generate reports conditioned on a fixed number of images. In this paper, we propose STREAM: Spatio-Temporal and REtrieval-Augmented Modelling for automatic chest X-ray report generation. It mimics clinical diagnosis by integrating current and historical studies to interpret the present condition (temporal), with each study containing images from multi-views (spatial). Concretely, our STREAM is built upon an encoder-decoder architecture, utilizing a large language model (LLM) as the decoder. Overall, spatio-temporal visual dynamics are packed as visual prompts and regional semantic entities are retrieved as textual prompts. First, a token packer is proposed to capture condensed spatio-temporal visual dynamics, enabling the flexible fusion of images from current and historical studies. Second, to augment the generation with existing knowledge and regional details, a progressive semantic retriever is proposed to retrieve semantic entities from a preconstructed knowledge bank as heuristic text prompts. The knowledge bank is constructed to encapsulate anatomical chest X-ray knowledge into structured entities, each linked to a specific chest region. Extensive experiments on public datasets have shown the state-of-the-art performance of our method. Related codes and the knowledge bank are available at https://github.com/yangyan22/STREAM.
科研通智能强力驱动
Strongly Powered by AbleSci AI