计算机科学
计算机视觉
人工智能
变压器
医学影像学
电气工程
工程类
电压
作者
Guoli Wang,Qikui Zhu,Chaoda Song,Benzheng Wei,Shuo Li
标识
DOI:10.1109/jbhi.2025.3541982
摘要
Vision Transformers (ViTs) suffer from high parameter complexity because they rely on Multi-layer Perceptrons (MLPs) for nonlinear representation. This issue is particularly challenging in medical image analysis, where labeled data is limited, leading to inadequate feature representation. Existing methods have attempted to optimize either the patch embedding stage or the non-embedding stage of ViTs. Still, they have struggled to balance effective modeling, parameter complexity, and data availability. Recently, the Kolmogorov-Arnold Network (KAN) was introduced as an alternative to MLPs, offering a potential solution to the large parameter issue in ViTs. However, KAN cannot be directly integrated into ViT due to challenges such as handling 2D structured data and dimensionality catastrophe. To solve this problem, we propose MedKAFormer, the first ViT model to incorporate the Kolmogorov-Arnold (KA) theorem for medical image representation. It includes a Dynamic Kolmogorov-Arnold Convolution (DKAC) layer for flexible nonlinear modeling in the patch embedding stage. Additionally, it introduces a Nonlinear Sparse Token Mixer (NSTM) and a Nonlinear Dynamic Filter (NDF) in the non-embedding stage. These components provide comprehensive nonlinear representation while reducing model overfitting. MedKAFormer reduces parameter complexity by 85.61% compared to ViT-Base and achieves competitive results on 14 medical datasets across various imaging modalities and structures.
科研通智能强力驱动
Strongly Powered by AbleSci AI