Canonical Correlation Analysis and Partial Least Squares for Identifying Brain–Behavior Associations: A Tutorial and a Comparative Study

典型相关偏最小二乘回归过度拟合虚假关系降维样本量测定计算机科学多元统计人类连接体项目偏相关维数之咒人工智能相关性超参数数据挖掘机器学习统计数学心理学人工神经网络几何学神经科学功能连接

作者

Agoston Mihalik,James Chapman,Rick A. Adams,Nils R. Winter,Fabio S. Ferreira,John Shawe-Taylor,Janaina Mourão-Miranda

出处

期刊：Biological Psychiatry: Cognitive Neuroscience and Neuroimaging [Elsevier]
日期：2022-08-01 卷期号：7 (11): 1055-1067 被引量：2

链接

usc.edu ac.uk ac.ukdoi.org

标识

DOI：10.1016/j.bpsc.2022.07.012

摘要

Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer's Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.

求助该文献

最长约 10秒，即可获得该文献文件

Canonical Correlation Analysis and Partial Least Squares for Identifying Brain–Behavior Associations: A Tutorial and a Comparative Study

今日热心研友