随机森林
计算机科学
聚类分析
人工智能
层次聚类
特征选择
决策树
特征(语言学)
数据挖掘
机器学习
模式识别(心理学)
语言学
哲学
作者
Xia-an Bi,Xi Hu,Hao Wu,Yan Wang
标识
DOI:10.1109/jbhi.2020.2973324
摘要
Alzheimer's disease (AD) has become a severe medical challenge. Advances in technologies produced high-dimensional data of different modalities including functional magnetic resonance imaging (fMRI) and single nucleotide polymorphism (SNP). Understanding the complex association patterns among these heterogeneous and complementary data is of benefit to the diagnosis and prevention of AD. In this paper, we apply the appropriate correlation analysis method to detect the relationships between brain regions and genes, and propose “brain region-gene pairs” as the multimodal features of the sample. In addition, we put forward a novel data analysis method from technology aspect, cluster evolutionary random forest (CERF), which is suitable for “brain region-gene pairs”. The idea of clustering evolution is introduced to improve the generalization performance of random forest which is constructed by randomly selecting samples and sample features. Through hierarchical clustering of decision trees in random forest, the decision trees with higher similarity are clustered into one class, and the decision trees with the best performance are retained to enhance the diversity between decision trees. Furthermore, based on CERF, we integrate feature construction, feature selection and sample classification to find the optimal combination of different methods, and design a comprehensive diagnostic framework for AD. The framework is validated by the samples with both fMRI and SNP data from ADNI. The results show that we can effectively identify AD patients and discover some brain regions and genes associated with AD significantly based on this framework. These findings are conducive to the clinical treatment and prevention of AD.
科研通智能强力驱动
Strongly Powered by AbleSci AI