计算机科学
人工智能
图形
社交网络(社会语言学)
聚类分析
社会化媒体
机器学习
模式识别(心理学)
注意力网络
作者
Yuhao Zhang,Kehui Song,Xiangrui Cai,Yierxiati Tuergong,Ling Yuan,Ying Zhang
标识
DOI:10.1007/978-3-030-87571-8_3
摘要
Social networks have become a popular way for Internet users to express their thoughts and exchange real-time information. The increasing number of topic-oriented resources in social networks has drawn more and more attention, leading to the development of topic detection. Topic detection of pure texts originates from text mining and document clustering, aiming to automatically identify topics from massive data in an unsupervised manner. With the development of mobile Internet, user-generated content in social networks usually contains multimodal data, such as images, videos, etc. Multimodal topic detection poses a new challenge of fusing and aligning heterogeneous features from different modalities, which has received limited attention in existing research studies. To address this problem, we adopt a Graph Fusion Network (GFN) based encoder and a multilayer perceptron (MLP) decoder to hierarchically fuse information from images and texts. The proposed method regards multimodal features as vertices and models the interactions between modalities with edges layer by layer. Therefore, the fused representations contain rich semantic information and explicit multimodal dynamics, which are beneficial to improve the performance of multimodal topic detection. Experimental results on the real-world multimodal topic detection dataset demonstrate that our model performs favorably against all the baseline methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI