计算机科学
增采样
编码器
人工智能
计算机视觉
图像分割
分割
安全性令牌
模式识别(心理学)
图像(数学)
计算机网络
操作系统
作者
ChaoYang Zhang,Shibao Sun,Hu Wenmao,Zhao Pengcheng
标识
DOI:10.1016/j.compbiomed.2023.107858
摘要
The U-shaped and Transformer architectures have achieved exceptional performance in medical image segmentation and natural language processing, respectively. Their combination has also led to remarkable results but still suffers from enormous loss of image features during downsampling and the difficulty of recovering spatial information during upsampling. In this paper, we propose a novel encoder-decoder architecture for medical image segmentation, which has a flexibly adjustable hybrid encoder and two expanding paths decoder. The hybrid encoder incorporates the feature double reuse (FDR) block and the encoder of Vision Transformer (ViT), which can extract local and global pixel localization information, and alleviate image feature loss effectively. Meanwhile, we retain the original class-token sequence in the Vision Transformer and develop an additional corresponding expanding path. The class-token sequence and abstract image features are leveraged by two independent expanding paths with the deep-supervision strategy, which can better recover the image spatial information and accelerate model convergence. To further mitigate the feature loss and improve spatial information recovery, we introduce successive residual connections throughout the entire network. We evaluated our model on the COVID-19 lung segmentation and the infection area segmentation tasks. The mIoU index increased by 1.5 points and 3.9 points compared to other models which demonstrates a performance improvement.
科研通智能强力驱动
Strongly Powered by AbleSci AI