DeepG4 : A deep learning approach to predict active G-quadruplexes from DNA

染色质 计算生物学 DNA 生物 DNA测序 DNA甲基化 遗传学 抄写(语言学) 序列母题 基因 基因表达 语言学 哲学
作者
Vincent Rocher,Matthieu Genais,Elissar Nassereddine,Raphaël Mourad
标识
DOI:10.1101/2020.07.22.215699
摘要

Abstract DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Here, we propose a novel convolutional neural network (DeepG4) to map active G4s (forming both in vitro and in vivo). DeepG4 is very accurate to predict active G4s, while most state-of-the-art algorithms fail. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 activity. We found that active G4 motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4s are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. Moreover, we showed that specific TFs might explain G4 activity depending on cell type. Lastly, variant analysis suggests that SNPs altering predicted G4 activity could affect transcription and chromatin, e.g . gene expression, H3K4me3 mark and DNA methylation. Thus, DeepG4 paves the way for future studies assessing the impact of known disease-associated variants on DNA secondary structure by providing a mechanistic interpretation of SNP impact on transcription and chromatin. Availability: https://github.com/morphos30/DeepG4 . Author summary DNA is a molecule carrying genetic information and found in all living cells. In 1953, Watson and Crick found that DNA has a double helix structure. However, other DNA structures were later identified, and most notably, G-quadruplex (G4). In 2000, the Human Genome Project revealed the widespread presence of G4s in the genome using algorithms. To date, all G4 mapping algorithms were developed to map G4s on naked DNA, without knowing if they could be formed in the cell. Here, we designed a novel artificial intelligence algorithm that could map G4s active in the cell from the DNA sequence. We showed its better accuracy compared to existing algorithms. Moreover, we identified key transcriptional factor motifs that could explain G4 activity depending on cell type. Lastly, we demonstrated the existence of mutations that could alter G4 activity and therefore impact molecular processes, such as transcription, in the cell. Such results could provide a novel mechanistic interpretation of known disease-associated mutations.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
sinmon完成签到 ,获得积分10
1秒前
1秒前
团团关注了科研通微信公众号
2秒前
2秒前
hua完成签到 ,获得积分10
2秒前
Rocc发布了新的文献求助10
3秒前
3秒前
无语的安白应助ZHANG采纳,获得10
5秒前
安静的卿发布了新的文献求助10
5秒前
jenningseastera应助zzzsss采纳,获得10
5秒前
斯文败类应助jiabaoyu采纳,获得10
6秒前
wpxyy发布了新的文献求助10
6秒前
7秒前
冬去春来发布了新的文献求助10
7秒前
迷路尔曼完成签到,获得积分10
7秒前
Owen应助Rocc采纳,获得10
8秒前
笑羽完成签到,获得积分0
10秒前
眼睛大怀曼完成签到,获得积分10
12秒前
12秒前
AmbitionY完成签到,获得积分10
13秒前
大学生发布了新的文献求助10
14秒前
15秒前
15秒前
16秒前
17秒前
xinye发布了新的文献求助10
18秒前
jiabaoyu发布了新的文献求助10
18秒前
19秒前
韩军军完成签到 ,获得积分10
19秒前
666完成签到,获得积分10
21秒前
FCY发布了新的文献求助10
21秒前
zheng发布了新的文献求助10
22秒前
我是老大应助认真的砖头采纳,获得10
22秒前
大模型应助wentong采纳,获得10
24秒前
安静的卿发布了新的文献求助10
24秒前
聪慧的微笑完成签到,获得积分20
24秒前
刘大恒完成签到,获得积分10
24秒前
25秒前
27秒前
高分求助中
Applied Survey Data Analysis (第三版, 2025) 800
Narcissistic Personality Disorder 700
Handbook of Experimental Social Psychology 500
The Martian climate revisited: atmosphere and environment of a desert planet 500
建国初期十七年翻译活动的实证研究. 建国初期十七年翻译活动的实证研究 400
Transnational East Asian Studies 400
Towards a spatial history of contemporary art in China 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3847231
求助须知:如何正确求助?哪些是违规求助? 3389760
关于积分的说明 10558708
捐赠科研通 3110017
什么是DOI,文献DOI怎么找? 1714165
邀请新用户注册赠送积分活动 825107
科研通“疑难数据库(出版商)”最低求助积分说明 775255