计算机科学
变压器
人工智能
编码器
互联网
机器学习
语音识别
工程类
电压
操作系统
电气工程
万维网
作者
Kazuki Miyazawa,Yuta Kyuragi,Takayuki Nagai
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:10: 29821-29833
被引量:17
标识
DOI:10.1109/access.2022.3159346
摘要
Transformer-based models have garnered attention because of their success in natural language processing, and in several other fields, such as image and automatic speech recognition. In addition to them being trained on unimodal information, many transformer-based models have been proposed for multimodal information. In multimodal learning, a common problem encountered is the insufficiency of multimodal training data. In this study, to address this problem, a simple and effective method is proposed by using 1) unimodal pre-trained transformer models as encoders for each modal input and 2) a set of transformer layers to fuse their output representations. Further, the proposed method is evaluated by conducting several experiments on two common benchmarks: CMU multimodal opinion sentiment intensity dataset and multimodal internet movie database. The proposed model exhibits state-of-the-art performances on both benchmarks and is robust against the reduction in the amount of training data.
科研通智能强力驱动
Strongly Powered by AbleSci AI