特征选择
人工智能
计算机科学
聚类分析
降维
回归
模式识别(心理学)
统计
数学
作者
Pouyan Rezapoor,Jonathan Pham,Beth K. Neilsen,Hengjie Liu,Minsong Cao,Yingli Yang,Ke Sheng,Ting Martin,James Lamb,Michael L. Steinberg,Amar U. Kishan,Zachary Taylor,Dan Ruan
摘要
Abstract Background It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high‐dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias. Purpose This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI‐guided stereotactic body radiation therapy (SBRT). Methods Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill‐conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT). Results Extensive experiments show that introducing cluster‐based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO‐LR being slightly better, the cluster‐based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control. Conclusions These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI‐guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.
科研通智能强力驱动
Strongly Powered by AbleSci AI