逻辑回归
RNA序列
突变
计算机科学
回归
基因
癌症
计算生物学
人工智能
生物
遗传学
统计
机器学习
数学
基因表达
转录组
作者
Juntao Li,Fuzhen Cao,Hongmei Zhang
标识
DOI:10.1016/j.bspc.2024.106025
摘要
RNA-seq is often used for early accurate diagnosis and related gene screening of liver cancer, significantly improving patients’ survival rates. Popular diagnostic methods based on machine learning often ignore genes with insignificant differential expression in RNA-seq and fail to characterize the overlapping group effect triggered by a few genes participating in multiple biological pathways. This paper aimed to solve the above problems by developing an adaptive logistic regression via integrating gene mutation and RNA-seq (ALRIGMR). A new data integration strategy was proposed to highlight genes with high mutation rates and insignificant differential expression. The local maximal quasi-clique merger (lmQCM) was used for the overlapping grouping, which was proved to be superior to the weighted gene co-expression network analysis (WGCNA). Relying on differential expression and mutational information, a new criterion for evaluating gene significance was proposed. ALRIGMR achieved a diagnosis accuracy of 88.4% on the external validation set, which is 23.0%, 53.8%, 26.9%, 15.3%, 11.5%, 7.6%, 3.8%, and 7.6% higher than that of eight methods. Five insignificant differentially expressed genes, TP53, TTN, MUC16, ABCA13, and RYR2 were screened, which were confirmed to be closely associated with liver cancer.
科研通智能强力驱动
Strongly Powered by AbleSci AI