水准点(测量)
计算机科学
匹配(统计)
算法
序列(生物学)
计算复杂性理论
DNA测序
DNA
数学
生物
统计
大地测量学
遗传学
地理
作者
Md. Sayeed Iftekhar Yousuf,Machbah Uddin,Mohammad Khairul Islam,Md. Rakib Hassan,Aysha Siddika Ratna,Farah Jahan
标识
DOI:10.1109/ncim59001.2023.10212654
摘要
DNA sequence analysis has enormous applications including gene modification, gene therapy, new variety development, etc., due to which, the size of the genome datasets is increasing exponentially, and it is propagating more computational challenges. Existing DNA sequence analysis algorithms are two types, e.g., alignment-based (AB) and alignment-free (AF). AB is effective for short and homologous sequences, but time and memory complexity is extremely high where AF algorithms can solve the major limitations. Existing AF algorithms use different relative information, but these techniques lose spatial information. Therefore, this research proposes a novel AF algorithm by introducing two brand new features standard deviation and zero count which is efficient in terms of memory, time, and accuracy. At first, it generates a k - mer count matrix and a position vector for each cell of the count matrix. Later, it calculates the standard deviation of the first-order difference and the number of zero counts of the second-order difference of positions. The method is tested in several benchmark datasets and the performance is compared with existing studies and tools. For all datasets, it shows 1.217 to 354 times less memory complexity and top accuracy. In the case of time complexity, it is 20 to 5768 times faster than the MEGA tool. Therefore, this system can be an effective platform for DNA matching.
科研通智能强力驱动
Strongly Powered by AbleSci AI