配对
抗体
计算机科学
化学
医学
物理
免疫学
超导电性
量子力学
作者
Sarah M. Burbach,Bryan Briney
出处
期刊:Patterns
[Elsevier]
日期:2024-04-01
卷期号:: 100967-100967
标识
DOI:10.1016/j.patter.2024.100967
摘要
Summary
Existing antibody language models are limited by their use of unpaired antibody sequence data. A recently published dataset of ∼1.6 × 106 natively paired human antibody sequences offers a unique opportunity to evaluate how antibody language models are improved by training with native pairs. We trained three baseline antibody language models (BALM), using natively paired (BALM-paired), randomly-paired (BALM-shuffled), or unpaired (BALM-unpaired) sequences from this dataset. To address the paucity of paired sequences, we additionally fine-tuned ESM (evolutionary scale modeling)-2 with natively paired antibody sequences (ft-ESM). We provide evidence that training with native pairs allows the model to learn immunologically relevant features that span the light and heavy chains, which cannot be simulated by training with random pairs. We additionally show that training with native pairs improves model performance on a variety of metrics, including the ability of the model to classify antibodies by pathogen specificity.
科研通智能强力驱动
Strongly Powered by AbleSci AI