作者
Saeed Ahmed,Nalini Schaduangrat,Ittipat Meewan,Watshara Shoombuatong
摘要
Epigenetics encompasses dynamic and reversible modifications that regulate gene activity without altering the underlying DNA sequence. Epigenetic processes, including non-coding RNA interactions, and DNA methylation regulate patterns of gene expression by responding to cellular signaling, environmental stimuli, and developmental cues. The balance of histone acetylation is maintained by histone deacetylase (HDAC) and histone acetyltransferase (HAT) activities. Aberrant HDAC upregulation, often seen in cancer cells, disrupts this balance. HDAC inhibitors (HDACi) are thus used in cancer treatment. However, most synthetic HDACis are not specific to HDAC classes or individual members, highlighting the need for highly selective HDAC inhibitors. Machine learning (ML)-driven methods are now recognized as rapid and cost-efficient tools in drug discovery and development, capable of identifying inhibitors solely from SMILES notation, without requiring the 3D ligand structure. Here, we present a novel and interpretable deep learning-based framework, DeepHDAC3i, for accurate in silico identification of HDAC3i using only the SMILES notation. Firstly, we employed five molecular encoding methods, namely CDKExt, KR, KRC, Pubchem, and RDKIT, to extract the biological and structural information in HDAC3i. These molecular representations were then fused to generate multi-view features. Secondly, elastic net was employed to determine the optimal feature subset and enhance prediction performance. Thirdly, a one-dimensional convolutional neural network (1D-CNN) coupled with the optimal feature set was chosen for the construction of the final model. Finally, our framework leveraged the Shapley Additive exPlanation algorithm to disclose the most important features for identifying HDAC3i. On the independent test dataset, DeepHDAC3i achieved an accuracy of 0.965, MCC of 0.930, and AUC of 0.985, which were significantly higher than several conventional machine learning and deep learning models. In addition, upon comparison with the existing methods, DeepHDAC3i secured the best performance with improvements of approximately 4.80, 4.70, 6.50, and 9.50% in accuracy, F1, AUC, and MCC, respectively. Taken together, DeepHDAC3i is superior to other compared models and can be a useful tool for precisely identifying HDAC3i.