AcidProNet: Acidophilic Protein Classification via DCGAN-GP-Based Data Augmentation and Parameter-Shared Mixture-of-Experts Transformer

计算机科学可扩展性机器学习鉴定（生物学）人工智能嵌入钥匙（锁）变压器理论（学习稳定性）数据挖掘生成语法利用蛋白质稳定性生物学数据蛋白质工程训练集药物发现源代码数据类型蛋白质-蛋白质相互作用蛋白质结构结构生物信息学

作者

Jiaxing Song,Aoyun Geng,Qianmao Wen,Junlin Xu,Yajie Meng,Feifei Cui,Quan Zou,Leyi Wei,Zilong Zhang

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2025-12-08 卷期号：65 (24): 13145-13162

链接

nih.govdoi.org

标识

DOI：10.1021/acs.jcim.5c02196

摘要

With the continued exploration of biological resources in extreme environments, functional proteins such as acidophilic proteins have attracted increasing attention. These proteins can maintain structural stability and biological functionality under highly acidic conditions (pH < 3), demonstrating significant application potential. However, the current identification of acidophilic proteins still relies on labor-intensive, time-consuming, and costly wet-lab experiments. Existing shallow machine learning methods (e.g., SVM, RF) are limited by their constrained model capacity─characterized by fewer parameters and shallow architectures─which restricts their ability to capture complex sequence-function relationships in acidophilic proteins. To address this issue, we propose an integrated computational framework, AcidProNet, which incorporates three key components: a CNN-based generative adversarial module, DCGAN-GP, for data augmentation; a sparsely and discretely activated, parameter-sharing Mixture-of-Experts Transformer, PS-MoE, designed to effectively optimize encoded features; and the utilization of a pretrained protein language model, ESM C, for extracting biologically meaningful protein embeddings. On an independent test set, our method outperforms existing models. Further expert-grouped and ablation experiments confirm its advantages in model stability and representational power. This study is the first to integrate embedding generation, data augmentation, and expert modeling within a unified framework, providing an efficient and scalable approach for functional protein prediction and laying a methodological foundation for advances in protein engineering applications. Additionally, we have developed a Web site for direct identification of acidophilic proteins, which can be accessed at http://www.bioai-lab.com/AcidNetPro.

求助该文献

最长约 10秒，即可获得该文献文件

AcidProNet: Acidophilic Protein Classification via DCGAN-GP-Based Data Augmentation and Parameter-Shared Mixture-of-Experts Transformer

今日热心研友