计算机科学
概化理论
亚型
复制
生物医学
机器学习
人工智能
匹配(统计)
数据挖掘
生物信息学
医学
生物
数学
统计
病理
程序设计语言
作者
Zhaoxiang Cai,Emma L. Boys,Zainab Noor,Adel Aref,Dylan Xavier,Natasha Lucas,Steven G. Williams,Jennifer Koh,Rebecca C. Poulos,Yangxiu Wu,Michael Dausmann,Karen L. MacKenzie,Adriana Aguilar‐Mahecha,Carolina Armengol,María Manuela Barranco,Mark Basik,Elise D. Bowman,Roderick Clifton‐Bligh,E. Connolly,Wendy A. Cooper
标识
DOI:10.1158/2159-8290.cd-24-1488
摘要
Abstract Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach’s generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.
科研通智能强力驱动
Strongly Powered by AbleSci AI