情态动词
代表(政治)
计算机科学
人工智能
门控
特征(语言学)
财产(哲学)
特征学习
模式识别(心理学)
机器学习
生物
政治
认识论
哲学
化学
高分子化学
生理学
法学
语言学
政治学
作者
Duoyi Zhang,Richi Nayak,Md Abul Bashar
出处
期刊:Neural Networks
[Elsevier BV]
日期:2024-07-17
卷期号:179: 106553-106553
被引量:15
标识
DOI:10.1016/j.neunet.2024.106553
摘要
Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.
科研通智能强力驱动
Strongly Powered by AbleSci AI