特征选择
小桶
基因
计算生物学
基因本体论
分类器(UML)
相互信息
降维
生物
随机森林
数据挖掘
计算机科学
人工智能
模式识别(心理学)
基因表达
遗传学
作者
Yanbao Sun,Qi Zhang,Qi Yang,Ming Yao,Fang Xu,Wenyu Chen
标识
DOI:10.3389/fpubh.2022.901602
摘要
Since the first report of SARS-CoV-2 virus in Wuhan, China in December 2019, a global outbreak of Corona Virus Disease 2019 (COVID-19) pandemic has been aroused. In the prevention of this disease, accurate diagnosis of COVID-19 is the center of the problem. However, due to the limitation of detection technology, the test results are impossible to be totally free from pseudo-positive or -negative. Improving the precision of the test results asks for the identification of more biomarkers for COVID-19. On the basis of the expression data of COVID-19 positive and negative samples, we first screened the feature genes through ReliefF, minimal-redundancy-maximum-relevancy, and Boruta_MCFS methods. Thereafter, 36 optimal feature genes were selected through incremental feature selection method based on the random forest classifier, and the enriched biological functions and signaling pathways were revealed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Also, protein-protein interaction network analysis was performed on these feature genes, and the enriched biological functions and signaling pathways of main submodules were analyzed. In addition, whether these 36 feature genes could effectively distinguish positive samples from the negative ones was verified by dimensionality reduction analysis. According to the results, we inferred that the 36 feature genes selected via Boruta_MCFS could be deemed as biomarkers in COVID-19.
科研通智能强力驱动
Strongly Powered by AbleSci AI