计算生物学
序列(生物学)
深度学习
计算机科学
人工智能
DNA测序
可扩展性
核糖核酸
软件
机器学习
生物
DNA
数据挖掘
生物信息学
遗传学
基因
数据库
程序设计语言
作者
Babak Alipanahi,Andrew Delong,Matthew T. Weirauch,Brendan J. Frey
摘要
The binding specificities of RNA- and DNA-binding proteins are determined from experimental data using a ‘deep learning’ approach. Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.
科研通智能强力驱动
Strongly Powered by AbleSci AI