计算机科学
人工智能
模式
图像融合
堆积
图像(数学)
融合
互联网
模式识别(心理学)
机器学习
万维网
语言学
社会科学
哲学
物理
核磁共振
社会学
作者
Ignazio Gallo,Gianmarco Ria,Nicola Landro,Riccardo La Grassa
标识
DOI:10.1109/ivcnz51579.2020.9290622
摘要
The modern digital world is becoming more and more multimodal. Looking on the internet, images are often associated with the text, so classification problems with these two modalities are very common. In this paper, we examine multimodal classification using textual information and visual representations of the same concept. We investigate two main basic methods to perform multimodal fusion and adapt them with stacking techniques to better handle this type of problem. Here, we use UPMC Food-101, which is a difficult and noisy multimodal dataset that well represents this category of multimodal problems. Our results show that the proposed early fusion technique combined with a stacking-based approach exceeds the state of the art on the dataset used.
科研通智能强力驱动
Strongly Powered by AbleSci AI