计算机科学
人工智能
淀粉样蛋白(真菌学)
机器学习
化学
无机化学
作者
Lan Huang,Qian Jiang,Youling L. Xiong,Guangzhao Zhang,Dan Shao
标识
DOI:10.1109/jbhi.2025.3567518
摘要
Identifying aggregation-prone proteins or peptides is essential for advancing our understanding of amyloid aggregation processes and their related pathogenic mechanisms. Recognizing potential amyloid hexapeptides can also support peptide-based drug design and reduce experimental costs. In this study, we proposed TSLAmy, a computational model designed to predict amyloid hexapeptides using a two-stage learning framework. In the first stage, we performed feature extraction on the hexapeptides, and in the second stage, we presented prediction model for amyloid hexapeptide aggregation. Firstly, to ensure balanced dataset partitioning, we applied a clustering-based method by training two autoencoders on all possible hexapeptides using their sequence and physicochemical features, respectively. The resulting clusters were used to stratify the data into training and testing datasets. Then, in the first stage, we extracted features from hexapeptides based on their sequence and physicochemical properties. The feature extraction module was used to obtain physicochemical features, while the ESM-2 module was responsible for extracting sequence features for each hexapeptide. Finally, in the second stage, the aggregation prediction module was employed to predict the aggregation potential of hexapeptides. The experimental results demonstrated that the accuracy of TSLAmy reached 0.8493 (0.8447-0.8539), outperforming other state-of-the-art methods. Furthermore, we predicted the aggregation potential of all 64,000,000 possible hexapeptides and analyzed the amino acids that form aggregation-prone hexapeptides. We anticipate that TSLAmy can offer new insights into the identification of aggregation-prone peptides, contributing to advancements in peptide drug development.
科研通智能强力驱动
Strongly Powered by AbleSci AI