计算生物学
训练集
蛋白质二级结构
集合(抽象数据类型)
蛋白质结构
生物系统
生物
计算机科学
人工智能
生物化学
程序设计语言
作者
Joseph W. Schafer,Lauren L. Porter
摘要
AlphaFold2 (AF2), a deep-learning-based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and/or tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Here, we use CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures-to directly test how well the AF2 architecture predicts alternative conformations of fold switchers outside of its training set. We tested CFold on eight fold switchers from six protein families. These proteins-whose secondary structures switch between α-helix and β-sheet and/or whose hydrogen bonding networks are reconfigured dramatically-had not been tested previously, and only one of their alternative conformations was in CFold's training set. Successful CFold predictions would indicate that the AF2 architecture can predict disparate alternative conformations of fold-switched conformations outside of its training set, while unsuccessful predictions would suggest that AF2 predictions of these alternative conformations likely arise from association with structures learned during training. Despite sampling 1300-4300 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.
科研通智能强力驱动
Strongly Powered by AbleSci AI