自编码
典型相关
人工智能
情态动词
代表(政治)
计算机科学
模态(人机交互)
模式识别(心理学)
深度学习
特征学习
模式
缩小
相关性
机器学习
数学
社会科学
化学
几何学
社会学
政治
政治学
高分子化学
法学
程序设计语言
作者
Fangxiang Feng,Xiaojie Wang,Ruifan Li
标识
DOI:10.1145/2647868.2654902
摘要
The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter $\alpha$ is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI