机器学习
计算机科学
鉴定(生物学)
生物标志物发现
人工智能
特征(语言学)
生物标志物
随机森林
组学
特征选择
数据挖掘
生物信息学
蛋白质组学
生物
基因
植物
哲学
生物化学
语言学
作者
Yifan Dai,Di Wu,Ian M. Carroll,Fei Zou,Baiming Zou
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2025-04-25
卷期号:41 (5)
标识
DOI:10.1093/bioinformatics/btaf266
摘要
Abstract Motivation Omics features, often measured by high-throughput technologies, combined with clinical features, significantly impact the understanding of many complex human diseases. Integrating key omics biomarkers with clinical risk factors is essential for elucidating disease mechanisms, advancing early diagnosis, and enhancing precision medicine. However, the high dimensionality and intricate associations between disease outcomes and omics profiles present substantial analytical challenges. Results We propose a high-dimensional feature importance test (HiFIT) framework to address these challenges. Specifically, we develop an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test employing machine learning methods to flexibly model the potential complex associations between disease outcomes and molecular biomarkers. Through extensive numerical simulation studies and practical applications to microbiome-associated weight changes following bariatric surgery, as well as the examination of gene-expression-associated kidney pan-cancer survival data, we demonstrate HiFIT’s superior performance in both outcome prediction and feature importance identification. Availability and implementation An R package implementing the HiFIT algorithm is available on GitHub (https://github.com/BZou-lab/HiFIT).
科研通智能强力驱动
Strongly Powered by AbleSci AI