Shared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi-Modal Whole Slide Image Analysis

计算机科学瓶颈情态动词人工智能特征学习特征（语言学）信息瓶颈法节点（物理）数据挖掘模式识别（心理学）机器学习相互信息工程类高分子化学化学嵌入式系统哲学结构工程语言学

作者

Zhihua Wang,Lequan Yu,Xin Ding,Xuehong Liao,Liansheng Wang

出处

期刊：IEEE Transactions on Medical Imaging [Institute of Electrical and Electronics Engineers]
日期：2023-11-01 卷期号：42 (11): 3374-3383 被引量：2

链接

nih.govdoi.org

标识

DOI：10.1109/tmi.2023.3287256

摘要

The fusion of multi-modal medical data is essential to assist medical experts to make treatment decisions for precision medicine. For example, combining the whole slide histopathological images (WSIs) and tabular clinical data can more accurately predict the lymph node metastasis (LNM) of papillary thyroid carcinoma before surgery to avoid unnecessary lymph node resection. However, the huge-sized WSI provides much more high-dimensional information than low-dimensional tabular clinical data, making the information alignment challenging in the multi-modal WSI analysis tasks. This paper presents a novel transformer-guided multi-modal multi-instance learning framework to predict lymph node metastasis from both WSIs and tabular clinical data. We first propose an effective multi-instance grouping scheme, named siamese attention-based feature grouping (SAG), to group high-dimensional WSIs into representative low-dimensional feature embeddings for fusion. We then design a novel bottleneck shared-specific feature transfer module (BSFT) to explore the shared and specific features between different modalities, where a few learnable bottleneck tokens are utilized for knowledge transfer between modalities. Moreover, a modal adaptation and orthogonal projection scheme were incorporated to further encourage BSFT to learn shared and specific features from multi-modal data. Finally, the shared and specific features are dynamically aggregated via an attention mechanism for slide-level prediction. Experimental results on our collected lymph node metastasis dataset demonstrate the efficiency of our proposed components and our framework achieves the best performance with AUC (area under the curve) of 97.34%, outperforming the state-of-the-art methods by over 1.27%.

求助该文献

最长约 10秒，即可获得该文献文件

Shared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi-Modal Whole Slide Image Analysis

今日热心研友