系统发育树
聚类分析
多序列比对
降维
计算机科学
序列分析
序列比对
计算生物学
主成分分析
生物
基因
遗传学
人工智能
肽序列
作者
Zhe Zhang,Miaomiao Zhu,Qi Xie,Robert P. Larkin,Xue-Ping Shi,Bo Zheng
摘要
Abstract Protein phylogenetic analysis focuses on the evolutionary relationships among related protein sequences and can help researchers infer protein functions and developmental trajectories. With the advent of the big data era, the existing protein phylogenetic methods, including distance matrix and character-based methods, are facing challenges in both running time and application scope. Here, we developed an R package that we call CProtMEDIAS that is useful for protein phylogenetic analysis. In contrast to existing phylogenetic analysis methods, CProtMEDIAS utilizes dimensionality reduction algorithms to digitize multiple sequence alignments and quickly conduct phylogenetic analysis with a large number of amino acid sequences from similarly distant protein families and species. We used CProtMEDIAS to perform a dimensionality reduction, clustering, pseudotime, specific residue and evolutionary trajectory analysis of the plant homeobox superfamily. We found that CProtMEDIAS delivers consistent clustering, fast running and elegant presentation and thus provides powerful new tools and methods for protein clustering and evolutionary analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI