计算机科学
变压器
感受野
人工智能
模式识别(心理学)
语音识别
计算机视觉
工程类
电气工程
电压
作者
Tian Zhang,Cheng Lian,Bingrong Xu,Yixin Su,Zhigang Zeng
标识
DOI:10.1016/j.knosys.2024.112175
摘要
The advancement in medical data collection technology has propelled an increase in demand for modeling cardiac physiological signals. However, current research primarily focuses on unimodal signals, leaving a gap in the study of more comprehensive multimodal signals. Directly applying late fusion specific to modalities or early fusion mixing modalities fails to adequately capture crossmodal information. This paper proposes an optional multimodal CNN-enhanced Transformer fusion network based on multiscale receptive fields. Introducing a switching modal experts for stage-wise representation, the first stage excavates modality-specific features and balances intermodal relationships, while the second stage captures crossmodal interaction information in a shared latent space, promoting deep modality fusion. Due to the flexibility of the switching modal experts, the model can be applied not only to multimodal data but also to unimodal data. Additionally, to address the performance disparity between Transformers and Convolutional Neural Networks (CNN), we combine the advantages of CNN to construct a CNN-enhanced Transformer. Specifically, improving patch embedding introduces multiscale receptive fields and integrates convolution and residual connections into the feed forward network (FFN) to assist the FFN in learning complex non-linear features and aggregating local features. Experimental results demonstrate that our model achieves outstanding performance in both unimodal and multimodal modes across different datasets, surpassing a range of CNNs, Transformers, and CNN-Transformer hybrid networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI