基因
k-最近邻算法
机器学习
同源(生物学)
人工智能
计算生物学
功能(生物学)
算法
基因预测
基因组
生物
计算机科学
遗传学
作者
Yuannong Ye,Dingfa Liang,Zhu Zeng
出处
期刊:Lecture notes in electrical engineering
日期:2021-11-11
卷期号:: 487-493
被引量:2
标识
DOI:10.1007/978-981-16-6554-7_54
摘要
Essential genes are indispensable for biological survival. Thus it is of great significance to identify and study essential genes. A machine learning method, K-Nearest Neighbor, is used for development of predicting essential bacterial genes. The homologous features, including sequence homology and functional homology, of the bacterial genomes are extracted for determining essential genes. Based on the features, we use K-Nearest Neighbor algorithm for determining of gene function. And we tune the minimum matching parameter (K) in the essential gene predicted model for building an optimal model of the Escherichia coli specificity model. The corresponding optimal parameter (K) is then extended to other bacterial essential genes predicting models. After cross validation, the highest accuracy is 0.89 while K between 5 and 7. Therefore, the features we extracted can increase the accuracy of the bacterial essential gene prediction. In the premise, we found that the prediction accuracy of the prediction model based on K-Nearest Neighbor was not significantly different in different evolutionary distances between organisms in the database and the investigated species. That means the machine learning model can be extended to more distant species. It wills have a better predictive performance for predicting essential genes of distant species than the usual sequence-based methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI