Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method

编码(内存) 能量(信号处理) 折叠(DSP实现) 学位(音乐) 计算机科学 极限(数学) DNA 算法 碱基对 深度学习 生物系统 过程(计算) 模式识别(心理学) 人工智能 数学 统计 生物 物理 遗传学 工程类 数学分析 声学 电气工程 操作系统
作者
Wanmin Lin,Ling Chu,Yanqing Su,Ranze Xie,Xiangyu Yao,Xiangzhen Zan,Peng Xu,Wenbin Liu
出处
期刊:Computers in Biology and Medicine [Elsevier]
卷期号:166: 107548-107548
标识
DOI:10.1016/j.compbiomed.2023.107548
摘要

In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied.As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences.Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
vvvvvv发布了新的文献求助10
2秒前
顾矜应助zhuangxiaocheng采纳,获得10
2秒前
dan1122完成签到,获得积分10
5秒前
5秒前
科里斯皮尔应助猎人1995采纳,获得10
6秒前
莉莉发布了新的文献求助10
8秒前
SciGPT应助平常若山采纳,获得10
9秒前
刘杰发布了新的文献求助10
10秒前
10秒前
kyJYbs发布了新的文献求助10
15秒前
15秒前
wdl完成签到 ,获得积分10
16秒前
淡竹结香发布了新的文献求助30
17秒前
打打应助二行采纳,获得10
17秒前
20秒前
wdl发布了新的文献求助10
20秒前
小二郎应助mdjinij采纳,获得10
20秒前
23秒前
HY兑完成签到,获得积分10
24秒前
25秒前
27秒前
character577发布了新的文献求助10
28秒前
29秒前
30秒前
czm33完成签到,获得积分10
30秒前
mdjinij发布了新的文献求助10
33秒前
yangshujuan发布了新的文献求助10
33秒前
别再熬夜完成签到,获得积分10
34秒前
怦怦应助lh采纳,获得10
35秒前
CodeCraft应助端庄的夏蓉采纳,获得10
35秒前
二行发布了新的文献求助10
36秒前
英俊的铭应助缓慢的半莲采纳,获得10
37秒前
别再熬夜发布了新的文献求助10
38秒前
天天快乐应助yangshujuan采纳,获得10
38秒前
40秒前
40秒前
mdjinij完成签到,获得积分10
41秒前
44秒前
西西发布了新的文献求助10
45秒前
47秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Sport in der Antike 800
De arte gymnastica. The art of gymnastics 600
Berns Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500
Stephen R. Mackinnon - Chen Hansheng: China’s Last Romantic Revolutionary (2023) 500
Sport in der Antike Hardcover – March 1, 2015 500
Boris Pesce - Gli impiegati della Fiat dal 1955 al 1999 un percorso nella memoria 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2423018
求助须知:如何正确求助?哪些是违规求助? 2111900
关于积分的说明 5347373
捐赠科研通 1839366
什么是DOI,文献DOI怎么找? 915645
版权声明 561239
科研通“疑难数据库(出版商)”最低求助积分说明 489747