Multi-Scale Dynamic Sparse Attention UNet for Medical Image Segmentation

计算机科学图像分割人工智能比例（比率）分割计算机视觉图像（数学）模式识别（心理学）物理量子力学

作者

Xiang Li,Chong Fu,Qun Wang,Wenchao Zhang,Chen Ye,Junxin Chen,Chiu‐Wing Sham

出处

期刊：IEEE Journal of Biomedical and Health Informatics [Institute of Electrical and Electronics Engineers]
日期：2025-01-01 卷期号：: 1-14

链接

nih.govdoi.org

标识

DOI：10.1109/jbhi.2025.3555805

摘要

Transformers have recently gained significant attention in medical image segmentation due to their ability to capture long-range dependencies. However, the presence of excessive background noise in large regions of medical images introduces distractions and increases the computational burden on the fine-grained self-attention (SA) mechanism, which is a key component of the transformer model. Meanwhile, preserving fine-grained details is essential for accurately segmenting complex, blurred medical images with diverse shapes and sizes. Thus, we propose a novel Multi-scale Dynamic Sparse Attention (MDSA) module, which flexibly reduces computational costs while maintaining multi-scale fine-grained interactions with content awareness. Specifically, multi-scale aggregation is first applied to the feature maps to enrich the diversity of interaction information. Then, for each query, irrelevant key-value pairs are filtered out at a coarse-grained level. Finally, fine-grained SA is performed on the remaining key-value pairs. In addition, we design an enhanced downsampling merging (EDM) module and an enhanced upsampling fusion (EUF) module for building pyramid architectures. Using MDSA to construct the basic blocks, combined with EDMs and EUFs, we develop a UNet-like model named MDSA-UNet. Since MDSA-UNet dynamically processes only a small subset of relevant fine-grained features, it achieves strong segmentation performance with high computational efficiency. Extensive experiments on four datasets spanning three different types demonstrate that our MDSA-UNet, without using pre-training, significantly outperforms other non-pretrained methods and even competes with pre-trained models, achieving Dice scores of 82.10% on DDTI, 80.20% on TN3K, 90.75% on ISIC2018, and 91.05% on ACDC. Meanwhile, our model maintains lower complexity, with only 6.65 M parameters and 4.54 G FLOPs at a resolution of 224×224, ensuring both effectiveness and efficiency. Code is available at https://github.com/NEU-LX/MDSA-UNet.

求助该文献

最长约 10秒，即可获得该文献文件

Multi-Scale Dynamic Sparse Attention UNet for Medical Image Segmentation

今日热心研友