概化理论
分割
肾细胞癌
医学
人工智能
威尔科克森符号秩检验
计算机科学
磁共振成像
放射科
核医学
内科学
数学
统计
曼惠特尼U检验
作者
Dat‐Thanh Nguyen,Maliha Imami,Linmei Zhao,Jing Wu,Ali Borhani,Alireza Mohseni,Mihir Khunte,Zhusi Zhong,Victoria Shi,Shujie Yao,Yuli Wang,Nicolas Loizou,Alvin C. Silva,Paul J. Zhang,Zishu Zhang,Zhicheng Jiao,Ihab R. Kamel,Weihua Liao,Harrison X. Bai
摘要
ABSTRACT Background Deep learning (DL) models for accurate renal tumor characterization may benefit from multi‐center datasets for improved generalizability; however, data‐sharing constraints necessitate privacy‐preserving solutions like federated learning (FL). Purpose To assess the performance and reliability of FL for renal tumor segmentation and classification in multi‐institutional MRI datasets. Study Type Retrospective multi‐center study. Population A total of 987 patients (403 female) from six hospitals were included for analysis. 73% (723/987) had malignant renal tumors, primarily clear cell carcinoma ( n = 509). Patients were split into training ( n = 785), validation ( n = 104), and test ( n = 99) sets, stratified across three simulated institutions. Field Strength/Sequence MRI was performed at 1.5 T and 3 T using T2‐weighted imaging (T2WI) and contrast‐enhanced T1‐weighted imaging (CE‐T1WI) sequences. Assessment FL and non‐FL approaches used nnU‐Net for tumor segmentation and ResNet for its classification. FL‐trained models across three simulated institutional clients with central weight aggregation, while the non‐FL approach used centralized training on the full dataset. Statistical Tests Segmentation was evaluated using Dice coefficients, and classification between malignant and benign lesions was assessed using accuracy, sensitivity, specificity, and area under the curves (AUCs). FL and non‐FL performance was compared using the Wilcoxon test for segmentation Dice and Delong's test for AUC ( p < 0.05). Results No significant difference was observed between FL and non‐ FL models in segmentation (Dice: 0.43 vs. 0.45, p = 0.202) or classification ( AUC : 0.69 vs. 0.64, p = 0.959) on the test set. For classification, no significant difference was observed between the models in accuracy ( p = 0.912), sensitivity ( p = 0.862), or specificity ( p = 0.847) on the test set. Data Conclusion FL demonstrated comparable performance to non‐FL approaches in renal tumor segmentation and classification, supporting its potential as a privacy‐preserving alternative for multi‐institutional DL models. Evidence Level 4. Technical Efficacy Stage 2.
科研通智能强力驱动
Strongly Powered by AbleSci AI