子串
编码(内存)
方案(数学)
树(集合论)
计算机科学
算法
理论计算机科学
数学
数据结构
组合数学
人工智能
程序设计语言
数学分析
作者
Jieqiong Wu,Penghao Wang,Yanfen Zheng,Bin Wang,Qiang Zhang,Pan Zheng
标识
DOI:10.1109/tcbbio.2025.3586008
摘要
DNA storage is considered to be a promising storage media in the current era of data explosion. DNA encoding is the beginning of the DNA storage process and lays the foundation for subsequent processes. However, many encoding methods suffer from low encoding rate, do not satisfy important constraints, or have insufficient sequence stability. To address these issues and improved sequences stability, this paper proposes a novel approach called the Repeating Substring Tree Encoding (RSTE) method. The method begins by applying the Longest Substring Backtracking Method (LSBM) to identify the longest repeated substrings within the binary file. These substrings are then encoded into compact DNA motifs using Huffman encoding. In contrast to the ideal coding density of 2 bits per nucleotide (2 bit/nt) targeted by previous studies, RSTE enhances the encoding rate by 13% through efficient utilization of repeated substrings. Furthermore, the DNA sequences generated by the RSTE method successfully meet three biological constraints: run-length limitation, GC content balance and end constraints. The experimental results of minimum free energy and melting temperature indicate that the stability of the sequences encoded by RSTE is also greatly improved. A series of experiments showed that the sequences encoded by RSTE have a higher coding rate, satisfy constraints, and are more stable.
科研通智能强力驱动
Strongly Powered by AbleSci AI