可解释性
计算机科学
特征(语言学)
代表(政治)
人工智能
语言模型
模式识别(心理学)
机器学习
哲学
语言学
政治
政治学
法学
作者
Yuhuan Liu,Haitian Zhong,Junyu Zhai,Xueying Wang,Tianchi Lu
标识
DOI:10.1101/2024.12.25.630296
摘要
Protein phosphorylation, a key post-translational modification (PTM), provides essential insight into protein properties, making its prediction highly significant. Using the emerging capabilities of large language models (LLMs), we apply LoRA fine-tuning to ESM2, a powerful protein large language model, to efficiently extract features with minimal computational resources, optimizing task-specific text alignment. Additionally, we integrate the conformer architecture with the Feature Coupling Unit (FCU) to enhance local and global feature exchange, further improving prediction accuracy. Our model achieves state-of-the-art (SOTA) performance, obtaining AUC scores of 79.5%, 76.3%, and 71.4% at the S, T, and Y sites of the general data sets. Based on the powerful feature extraction capabilities of LLMs, we conduct a series of analyses on protein representations, including studies on their structure, sequence, and various chemical properties (such as Hydrophobicity (GRAVY), Surface Charge, and Isoelectric Point). We propose a test method called Linear Regression Tomography (LRT) which is a top-down method using representation to explore the model’s feature extraction capabilities, offering a pathway to improved interpretability.
科研通智能强力驱动
Strongly Powered by AbleSci AI