降维
主成分分析
扩散图
非线性降维
计算机科学
维数之咒
投影(关系代数)
还原(数学)
背景(考古学)
尺寸缩减
人工智能
生物系统
模式识别(心理学)
算法
数学
生物
数学物理
几何学
古生物学
作者
Francesco Trozzi,Xinlei Wang,Peng Tao
标识
DOI:10.1021/acs.jpcb.1c02081
摘要
Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.
科研通智能强力驱动
Strongly Powered by AbleSci AI