计算机科学
人工智能
模式
机器学习
杠杆(统计)
模态(人机交互)
任务(项目管理)
监督学习
推论
多模式学习
模式识别(心理学)
人工神经网络
社会科学
社会学
经济
管理
作者
Aiham Taleb,Christoph Lippert,Tassilo Klein,Moin Nabi
标识
DOI:10.1007/978-3-030-78191-0_51
摘要
Self-supervised learning approaches leverage unlabeled samples to acquire generic knowledge about different concepts, hence allowing for annotation-efficient downstream task learning. In this paper, we propose a novel self-supervised method that leverages multiple imaging modalities. We introduce the multimodal puzzle task, which facilitates representation learning from multiple image modalities. The learned modality-agnostic representations are obtained by confusing image modalities at the data-level. Together with the Sinkhorn operator, with which we formulate the puzzle solving optimization as permutation matrix inference instead of classification, they allow for efficient solving of multimodal puzzles with varying levels of complexity. In addition, we also propose to utilize generation techniques for multimodal data augmentation used for self-supervised pretraining, instead of downstream tasks directly. This aims to circumvent quality issues associated with synthetic images, while improving data-efficiency and the representations learned by self-supervised methods. Our experimental results show that solving our multimodal puzzles yields better semantic representations, compared to treating each modality independently. Our results also highlight the benefits of exploiting synthetic images for self-supervised pretraining. We showcase our approach on three segmentation tasks, and we outperform many solutions and our results are competitive to state-of-the-art.
科研通智能强力驱动
Strongly Powered by AbleSci AI