A clustering‐based approach to address correlated features in predicting genitourinary toxicity from MRI‐guided prostate SBRT

特征选择 人工智能 计算机科学 聚类分析 降维 回归 模式识别(心理学) 统计 数学
作者
Pouyan Rezapoor,Jonathan Pham,Beth K. Neilsen,Hengjie Liu,Minsong Cao,Yingli Yang,Ke Sheng,Ting Martin,James Lamb,Michael L. Steinberg,Amar U. Kishan,Zachary Taylor,Dan Ruan
出处
期刊:Medical Physics [Wiley]
标识
DOI:10.1002/mp.17834
摘要

Abstract Background It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high‐dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias. Purpose This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI‐guided stereotactic body radiation therapy (SBRT). Methods Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill‐conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT). Results Extensive experiments show that introducing cluster‐based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO‐LR being slightly better, the cluster‐based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control. Conclusions These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI‐guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
虚幻蜜粉发布了新的文献求助25
1秒前
科研通AI6.1应助LONG采纳,获得10
1秒前
2秒前
Wangyn发布了新的文献求助10
3秒前
3秒前
ywhys发布了新的文献求助10
3秒前
xiaoruiyao发布了新的文献求助10
3秒前
4秒前
诚心的嚣发布了新的文献求助10
4秒前
Owen应助何时出发采纳,获得10
5秒前
甜橙汁发布了新的文献求助10
6秒前
6秒前
领导范儿应助笑点低夜春采纳,获得10
7秒前
团子呀发布了新的文献求助10
7秒前
能干函完成签到,获得积分10
7秒前
ding应助小林采纳,获得10
8秒前
mumu发布了新的文献求助10
9秒前
9秒前
yy完成签到,获得积分10
11秒前
cbyyy完成签到,获得积分10
11秒前
12秒前
大力的猕猴桃完成签到,获得积分10
12秒前
zzzdx完成签到 ,获得积分10
14秒前
14秒前
14秒前
TH完成签到 ,获得积分10
14秒前
14秒前
111发布了新的文献求助10
15秒前
张益发发布了新的文献求助10
16秒前
vfi完成签到,获得积分10
16秒前
16秒前
Orange应助nini采纳,获得10
17秒前
bkagyin应助xiaoruiyao采纳,获得10
17秒前
椰果爱发布了新的文献求助10
17秒前
17秒前
17秒前
传奇3应助诚心的嚣采纳,获得10
17秒前
ATTENTION完成签到,获得积分10
17秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
PowerCascade: A Synthetic Dataset for Cascading Failure Analysis in Power Systems 2000
The Composition and Relative Chronology of Dynasties 16 and 17 in Egypt 1500
Picture this! Including first nations fiction picture books in school library collections 1500
Signals, Systems, and Signal Processing 610
Unlocking Chemical Thinking: Reimagining Chemistry Teaching and Learning 555
Scientific Writing and Communication: Papers, Proposals, and Presentations 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6370318
求助须知:如何正确求助?哪些是违规求助? 8184259
关于积分的说明 17266518
捐赠科研通 5424904
什么是DOI,文献DOI怎么找? 2870073
邀请新用户注册赠送积分活动 1847081
关于科研通互助平台的介绍 1693826