作者
Shizhen Qiu,Jian Chen,Tao Wu,Li Li,Gang Wang,Haitao Wu,Xianmin Song,Xuesong Liu,Haopeng Wang
摘要
2][3] Our previous work has elucidated that positively charged patches (PCPs) on the surface of the CAR antigenbinding domain facilitate CAR clustering, thereby triggering CAR tonic signals.To quantify these PCPs, which are indicative of CAR tonic signaling, we previously developed a bioinformatic method to determine the PCP score. 1 This calculation method starts with constructing three-dimensional (3D) homology models for CAR's single-chain variable fragments (scFvs) using the SWISS homology modeler.Subsequently, the BindUP web server is used to determine the total count of residues within the top three largest patches containing continuous positively charged residues on the surface of CAR scFv.However, this PCP score calculation method has several limitations: 1. reliance on two external servers; 2. each calculation taking a few days, significantly hindering efficiency; 3. lack of batch calculation capability; 4. no optimization strategies provided for finetuning PCP scores.Given these constraints, we aimed to develop an artificial intelligence (AI)-based PCP score calculator and optimizer to overcome these bottlenecks.Protein databases, structural biology, and advanced deep learning models are all integrated into our AI-based PCP score calculator (Fig. 1a).A comprehensive protein structure database consisting of over 170,000 entries was established by extracting 3D structural information from the Protein Data Bank (PDB) and AlphaFold predictions, followed by stringent quality control procedures.We further developed an in-house algorithm tailored for calculating PCP scores based on the obtained 3D structure information (Supplementary Information), subsequently generating a dataset comprising approximately 170,000 protein sequences along with their associated PCP scores.For model training and evaluation, 70% of the data are allocated as the training dataset, while the remaining 30% serve as the test dataset.The ESM2 model, developed by the FAIR (Meta Fundamental AI Research Protein Team), is utilized for fine-tuning tasks related to PCP prediction. 4,5SM2 is a transformer-based language model using an attention mechanism to learn interaction patterns between pairs of amino acids in the input sequence.Pre-trained on over 60 million protein sequences from the UniProt Reference Clusters (UniRef) database, ESM2 demonstrates strong adaptability to downstream protein structure-related tasks. 5The ESM2-8M model was used to fine-tune the training dataset.Following updating parameters, the ESM2 model was transformed into the PCP-AI prediction model, referred to as CAR-Tonic Signal Tuner (abbreviated as CAR-Toner; http://cartfitness.slst.shanghaitech.edu.cn/CAR-fitness/).This model encompasses three key functionalities: proficient PCP calculation for individual proteins, streamlined batch processing, and an integrated optimization strategy for refining PCP scores (Fig. 1b).