主成分分析
降维
维数(图论)
稀疏PCA
算法
不相关
计算机科学
维数之咒
预处理器
人工智能
特征(语言学)
架空(工程)
模式识别(心理学)
数学
统计
哲学
纯数学
语言学
操作系统
作者
Arpita Gang,Waheed U. Bajwa
标识
DOI:10.1109/tsp.2022.3229635
摘要
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.
科研通智能强力驱动
Strongly Powered by AbleSci AI