Cooperative sequence clustering and decoding for DNA storage system with fountain codes

计算机科学 解码方法 错误检测和纠正 阅读(过程) 序列(生物学) 编码(集合论) 聚类分析 算法 列表解码 顺序译码 级联纠错码 人工智能 生物 遗传学 区块代码 程序设计语言 集合(抽象数据类型) 政治学 法学
作者
Jaeho Jeong,Seong‐Joon Park,Jae-Won Kim,Jong‐Seon No,Ha Hyeon Jeon,Jeong Wook Lee,Albert No,Sunghwan Kim,Hosung Park
出处
期刊:Bioinformatics [Oxford University Press]
卷期号:37 (19): 3136-3143 被引量:45
标识
DOI:10.1093/bioinformatics/btab246
摘要

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection and quality score-based ordering of sequences. We synthesized 513.6 KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thus was able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability and implementation The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
入梦来发布了新的文献求助50
刚刚
北风发布了新的文献求助10
1秒前
大个应助qwerdf采纳,获得10
1秒前
2秒前
香蕉觅云应助小愚采纳,获得10
2秒前
2秒前
secret完成签到,获得积分10
3秒前
5秒前
5秒前
6秒前
所所应助小帅采纳,获得10
6秒前
寻悦发布了新的文献求助10
6秒前
7秒前
lucky发布了新的文献求助10
7秒前
8秒前
一个饼完成签到,获得积分10
8秒前
齐齐完成签到,获得积分10
9秒前
9秒前
9秒前
浮游应助爱喝酸奶的天真采纳,获得10
10秒前
科研痛痛痛完成签到,获得积分10
10秒前
10秒前
顾矜应助Dora采纳,获得10
11秒前
华仔应助yuanshl1985采纳,获得10
11秒前
勤劳弘文完成签到,获得积分10
12秒前
kaka7发布了新的文献求助30
14秒前
浮游应助柚子茶茶茶采纳,获得20
15秒前
6k发布了新的文献求助10
15秒前
16秒前
韶时之约发布了新的文献求助10
16秒前
八月完成签到,获得积分10
17秒前
无敌咖啡豆完成签到,获得积分10
18秒前
19秒前
完美世界应助钟山采纳,获得10
19秒前
勤恳的猕猴桃完成签到,获得积分10
20秒前
ugh发布了新的文献求助10
20秒前
JQM发布了新的文献求助10
21秒前
21秒前
21秒前
fyc完成签到,获得积分10
22秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
On the Angular Distribution in Nuclear Reactions and Coincidence Measurements 1000
Vertébrés continentaux du Crétacé supérieur de Provence (Sud-Est de la France) 600
A complete Carnosaur Skeleton From Zigong, Sichuan- Yangchuanosaurus Hepingensis 四川自贡一完整肉食龙化石-和平永川龙 600
FUNDAMENTAL STUDY OF ADAPTIVE CONTROL SYSTEMS 500
微纳米加工技术及其应用 500
Nanoelectronics and Information Technology: Advanced Electronic Materials and Novel Devices 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5307271
求助须知:如何正确求助?哪些是违规求助? 4453001
关于积分的说明 13855757
捐赠科研通 4340578
什么是DOI,文献DOI怎么找? 2383323
邀请新用户注册赠送积分活动 1378137
关于科研通互助平台的介绍 1345951