自编码
特征(语言学)
简单(哲学)
计算机科学
进化算法
回归
机器学习
人工智能
数学
深度学习
统计
哲学
语言学
认识论
作者
Hsu, Chloe,Nisonoff, Hunter,Fannjiang, Clara,Listgarten, Jennifer
标识
DOI:10.1038/s41587-021-01146-5
摘要
Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily-related sequences, or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one density feature from modelling the evolutionary data. Within this approach, we find that a variational autoencoder-based density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and sufficient baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI