计算机科学
卷积神经网络
编码
水准点(测量)
人工智能
模式识别(心理学)
瓶颈
编码器
人工神经网络
机器学习
计算生物学
生物
遗传学
基因
操作系统
嵌入式系统
地理
大地测量学
作者
Bin Yu,Yaqun Zhang,Xue Wang,Hongli Gao,Jianqiang Sun,Xin Gao
标识
DOI:10.1016/j.bspc.2022.103566
摘要
DNA N4-methylcytosine (4mC) and DNA N6-methyladenine (6mA) are significant epigenetic modifications. 4mC is closely related to the restriction modification system, and 6mA has a hand in the process of various cellular activities. In order to further explore their functional mechanisms and biological significance, and to overcome the bottleneck of narrow coverage in traditional experimental methods, it is needed to propose an efficient prediction method with a wide range of applications. In this work, we develop a prediction method named 4mCi6mA-BGC to predict 4mC sites and 6mA sites. First, we employ binary, K-mer nucleotide frequency (K-mer), pseudo K-tuple nucleotide composition (PseKNC), dinucleotide-based auto covariance (DAC) and monoDiKGap theoretical description (MonoDiKGap) to encode DNA sequences. Then, the elastic net is employed for feature selection, and the optimized feature space is put into a deep learning framework composed of bidirectional gated recurrent unit and convolutional neural network. The benchmark datasets include six datasets, which contain 14 328 4mC sites from different species. The results of 10-fold cross-validation indicate that the prediction accuracy significantly outperforms the existing prediction methods. Meanwhile, use independent datasets Rice and Arabidopsis thaliana to further confirm the predictive ability of 4mCi6mA-BGC. Compared with the existing prediction methods, 4mCi6mA-BGC shows the best prediction performance. These comprehensive results indicate that our method can identify DNA modification sites represented by 4mC and 6mA sites.
科研通智能强力驱动
Strongly Powered by AbleSci AI