图像分割
计算机视觉
人工智能
分割
之字形的
图像(数学)
计算机科学
尺度空间分割
数学
几何学
作者
Tianxiang Chen,X. R. Zhou,Zhentao Tan,Yue Wu,Ziyang Wang,Zi Ye,Tao Gong,Qi Chu,Nenghai Yu,Le Lü
标识
DOI:10.1109/tmi.2025.3561797
摘要
Medical image segmentation has made significant strides with the development of basic models. Specifically, models that combine CNNs with transformers can successfully extract both local and global features. However, these models inherit the transformer's quadratic computational complexity, limiting their efficiency. Inspired by the recent Receptance Weighted Key Value (RWKV) model, which achieves linear complexity for long-distance modeling, we explore its potential for medical image segmentation. While directly applying vision-RWKV yields sub-optimal results due to insufficient local feature exploration and disrupted spatial continuity, we propose a novel nested structure, Zigzag RWKV-in-RWKV (Zig-RiR), to address these issues. It consists of Outer and Inner RWKV blocks to adeptly capture both global and local features without disrupting spatial continuity. We treat local patches as "visual sentences" and use the Outer Zig-RWKV to explore global information. Then, we decompose each sentence into sub-patches ("visual words") and use the Inner Zig-RWKV to further explore local information among words, at negligible computational cost. We also introduce a Zigzag-WKV attention mechanism to ensure spatial continuity during token scanning. By aggregating visual word and sentence features, our Zig-RiR can effectively explore both global and local information while preserving spatial continuity. Experiments on four medical image segmentation datasets of both 2D and 3D modalities demonstrate the superior accuracy and efficiency of our method, outperforming the state-of-the-art method 14.4 times in speed and reducing GPU memory usage by 89.5% when testing on 1024 × 1024 high-resolution medical images. Our code is available at https://github.com/txchen-USTC/Zig-RiR.
科研通智能强力驱动
Strongly Powered by AbleSci AI