典型相关
计算机科学
特征向量
傅里叶变换
算法
投影(关系代数)
比例(比率)
判别式
人工智能
数学
量子力学
物理
数学分析
作者
Xiang‐Jun Shen,Zhaorui Xu,Liangjun Wang,Zechao Li
标识
DOI:10.1145/3503161.3547988
摘要
Canonical correlation analysis (CCA) is a linear correlation analysis technique used widely in the statistics and machine learning community. However, the high complexity involved in pursuing eigenvector lays a heavy burden on the memory and computational time, making CCA nearly impractical in large-scale cases. In this paper, we attempt to overcome this issue by representing the data in the Fourier domain. Thanks to the data characteristic of pattern repeatability, one can translate projection-seeking of CCA into choosing some discriminative Fourier bases with only element-wise dot product and sum operations, without time-consuming eigenvector computation. Another merit of this scheme is that the eigenvalues can be approximated asymptotically in contrast to existing methods. Specifically, the eigenvalues can be estimated progressively, and the accuracy goes up as the number of data samples increases monotonously. This makes it possible to use partial data samples to obtain satisfactory accuracy. All the facts above make the proposed method extremely fast and memory efficient. Experimental results on several large-scale datasets, such as MNIST 8M, X-RAY MICROBEAM SPEECH, and TWITTER USERS Data, demonstrate the superiority of the proposed algorithm over SOTA large-scale CCA methods, as our proposed method achieves almost same accuracy with the training time being 1,000 times faster than SOTA methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI