模式
计算机科学
模态(人机交互)
情态动词
推荐系统
图形
人工智能
机器学习
特征学习
代表(政治)
自然语言处理
人机交互
多媒体
理论计算机科学
高分子化学
化学
社会科学
社会学
政治
政治学
法学
作者
Zixuan Yi,Xi Wang,Iadh Ounis,Craig Macdonald
标识
DOI:10.1145/3477495.3532027
摘要
Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements in these platforms are facilitated by multi-modal recommendation systems. Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model. Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance. However, we argue that these approaches are not sufficient to encode item representations with multiple modalities, since the used methods cannot fully disentangle the users' tastes on different modalities. To tackle this problem, we propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims to explicitly enhance multi-modal representation learning in a self-supervised learning manner. In particular, we devise two augmentation techniques to generate the multiple views of a user/item: modality edge dropout and modality masking. Furthermore, we introduce a novel negative sampling technique that allows to learn the correlation between modalities and ensures the effective contribution of each modality. Extensive experiments conducted on two micro-video datasets demonstrate the superiority of our proposed MMGCL method over existing state-of-the-art approaches in terms of both recommendation performance and training convergence speed.
科研通智能强力驱动
Strongly Powered by AbleSci AI