鉴定(生物学)
计算机科学
基因
计算生物学
抗生素耐药性
自然语言处理
人工智能
抗生素
遗传学
生物
植物
作者
Shafayat Ahmed,Muhit Islam Emon,Nazifa Ahmed Moumi,Liqing Zhang
标识
DOI:10.1109/bibm55620.2022.9995492
摘要
Antibiotic resistance is a silent pandemic, causing 700 thousand human deaths across the world every year. Antibiotic resistance genes (ARG) are genes conferring resistance for the bacteria carrying them. Predicting ARGs is an important computational task. Traditionally ARGs are predicted using alignment based methods. However, the false negative rate for most of the alignment-based tools is very high. The protein language models (LM) trained on the huge corpus protein sequences capture distant relations among protein sequences. These features can be utilized for the identification and classification of ARGs. We have presented a self-supervised model on the largest available ARG database with the help of a pre-trained language model ProtAlbert. We used the raw protein LM-embeddings from unlabeled data on our ARG classification task and saw it outperform state-of-the-art prediction algorithms. The extracted features from the pretrained language model boosted the supervised model accuracy to a great margin.
科研通智能强力驱动
Strongly Powered by AbleSci AI