随机森林
计算机科学
编码器
机器学习
分类器(UML)
人工智能
特征学习
自编码
特征向量
疾病
人工神经网络
图形
数据挖掘
模式识别(心理学)
理论计算机科学
医学
操作系统
病理
作者
Qing-Wen Wu,Junfeng Xia,Jiancheng Ni,Chun-Hou Zheng
摘要
Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.
科研通智能强力驱动
Strongly Powered by AbleSci AI