摘要
Abstract Introduction: Telomerase activity is upregulated in 85-90% of cancers, making it a critical therapeutic target. However, the development of small-molecule inhibitors targeting telomerase is hindered by structural complexity and limited resources for three decades since its discovery. To bridge the gap, we present TeloPred—a pioneering structure-activity machine learning classification model for predicting small-molecules as telomerase inhibitors, advancing anti-cancer drug discovery. Methodology: TeloPred was developed using a curated dataset of telomerase inhibitors with IC50 values from ChEMBL. After preprocessing, key molecular properties (e.g., molecular weight, TPSA) were calculated using RDKit, and compounds were classified as active/inactive based on a pIC50 cutoff of 5.2 nM. Informative features were selected using variance thresholding and Recursive Feature Elimination (RFE) from each of the 12 molecular fingerprints generated with PaDEL software. The dataset was split (80% training, 20% testing), and six machine learning algorithms (e.g., Random Forest, SVC, XGBoost, AdaBoost) were trained, fine-tuned, and evaluated using metrics like accuracy, F1 score and AUC-ROC. The best model underwent external validation with a decoy set, SHAP analysis for interpretability, and screening of a natural compound library. TeloPred will be soon available on a public webserver for global use. Results: Data preprocessing reduced 388 compounds to 281. Exploratory analysis showed distinct clustering of active/inactive compounds based on properties like molecular weight, aromaticity, TPSA and H-Bond donor and acceptors. Among tested models, Support vector classifier performed best, achieving 87.2% accuracy on the test set and 89% on training, with low false positives/negatives. External validation yielded an enrichment factor of 21, indicating strong predictive strength of model. SHAP analysis revealed aromatic groups, amide linkage, and carbonyl groups as critical for telomerase inhibition. Screening a natural compound library narrowed the search space by 83%, identifying 10 leads with high predicted probabilities and QED scores which are under experimental validation. Conclusion: TeloPred is the first ML classification model designed to effectively distinguish telomerase inhibitors, enabling efficient virtual screening of large libraries. It minimizes attrition rates compared to traditional high-throughput screening (HTS) methods, offering a powerful tool to accelerate telomerase-targeted drug discovery for cancer treatment. Its availability as a publicly accessible resource further amplifies its impact on global research efforts to drive precision medicine. Citation Format: Divpreet Kaur, Daman Saluja, Madhu Chopra. TeloPred: A machine learning classification webserver for prediction of small molecules as telomerase inhibitors for anti-cancer drug development [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3661.