计算机科学
转录因子
卷积神经网络
编码
DNA结合位点
深度学习
抄写(语言学)
k-mer公司
人工智能
计算生物学
DNA测序
生物
遗传学
发起人
基因
基因表达
语言学
哲学
作者
Wei Wang,Xiaolin Jiao,Sun Bin,Shihao Liang,Xianfang Wang,Yun Zhou
标识
DOI:10.1109/bibm55620.2022.9994984
摘要
Transcription factors are a class of protein factors that bind directly or indirectly to RNA polymerases and regulate the initiation of transcription by recognizing cis-acting elements in the DNA sequence. The prediction of transcription factor binding sites is an important part of the study of gene transcriptional regulation. Therefore, accurate prediction of TFBS helps one to understand and study the spatiotemporal nature of transcriptional regulation of target genes by different transcription factors. In recent years, an increasing number of deep learning methods have been used to predict transcription factor binding sites, however, existing methods still much room to improve performance. In this paper, we present a deep learning framework combining convolutional neural networks and recurrent neural networks to predict transcription factor binding sites, called DeepGenBind, for the systematic identification of transcription factor binding sites from DNA sequences. The novelty of our proposed approach relies on two key aspects: (1) the framework combines a three-layer parallel convolutional neural network CNN with a two-layer LSTM to efficiently extract useful features from large-scale genomic sequences obtained by high-throughput sequencing techniques (2) the use of k-mer coding to transform DNA sequences, with the transformed short sequences allowing for better data reading. Experimental results on 165 datasets from ENCODE show that DeepGenBind outperforms several other state-of-the-art methods in identifying transcription factor binding sites. In addition, we tested the effect of varying the k-mer vector length on model performance, demonstrating the variation in model performance under different k-mer related parameter settings. Overall, DeepGenBind is a useful tool for the cost-effective and accurate identification of potential transcription factor binding sites in biological genomes.
科研通智能强力驱动
Strongly Powered by AbleSci AI