矩阵分解
数学
基质(化学分析)
点积
因式分解
维数(图论)
欧几里德距离
度量(数据仓库)
矩阵完成
非负矩阵分解
秩(图论)
计算机科学
模式识别(心理学)
人工智能
算法
组合数学
数据挖掘
特征向量
复合材料
高斯分布
物理
材料科学
量子力学
几何学
作者
Anoop Praturu,Tatyana O. Sharpee
摘要
Abstract Matrix factorization is a central paradigm in matrix completion and collaborative filtering. Low-rank factorizations have been extremely successful in reconstructing and generalizing high-dimensional data in a wide variety of machine learning problems from drug-target discovery to music recommendations. Virtually all proposed matrix factorization techniques use the dot product between latent factor vectors to reconstruct the original matrix. We propose a reformulation of the widely used logistic matrix factorization in which we use the distance, rather than the dot product, to measure similarity between latent factors. We show that this measure of similarity, which can draw nonlinear decision boundaries and respect triangle inequalities between points, has more expressive power and modeling capacity. The distance-based model implemented in Euclidean and hyperbolic space outperforms previous formulations of logistic matrix factorization on three different biological test problems with disparate structure and statistics. In particular, we show that a distance-based factorization (1) generalizes better to test data, (2) achieves optimal performance at lower factor space dimension, and (3) clusters data better in the latent factor space.
科研通智能强力驱动
Strongly Powered by AbleSci AI