Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation

发起人 碱基对 DNA 遗传学 基因 生物 DNA结合位点 抄写(语言学) DNA测序 计算生物学 基因表达 语言学 哲学
作者
S. Pradeepa,Niveda Gaspar,S. Vimal,P. Subbulakshmi,Ahmed Alkhayyat,M. Kaliappan
出处
期刊:Intelligent Decision Technologies [IOS Press]
卷期号:18 (1): 613-631
标识
DOI:10.3233/idt-230283
摘要

A promoter is a brief stretch of DNA (100–1,000 bp) where RNA polymerase starts to transcribe a gene. A DNA (Deoxyribonucleic Acid) base pair is a fundamental unit of DNA structure and represents the pairing of two complementary nucleotide bases within the DNA double helix. The four DNA nucleotide bases are adenine (A), thymine (T), cytosine (C), and guanine (G). DNA base pairs are the building blocks of the DNA molecule, and their complementary pairing is central to the storage and transmission of genetic information in all living organisms. Normally, a promoter is found at the 5′ end of the transcription initiation site or immediately upstream. Numerous human disorders, particularly diabetes, cancer, and Huntington’s disease, have been shown to have DNA promoter as their root cause. The scientific community has long been interested in learning crucial information about protein-coding genes. Finding the promoters is therefore the first step in finding genes in DNA sequences. The scientific world has always been attracted by the effort to glean crucial knowledge about protein-coding genes. Consequently, identifying promoters has emerged as an intriguing challenge that has caught the interest of numerous researchers in the field of bioinformatics. We proposed Gaussian Decision Boundary Estimation in machine learning models to detect transcription start sites (promoters) in the DNA sequences of a common bacteria, Escherichia coli. The best features are identified through a score-based function to select relevant nucleotides that are directly responsible for promoter recognition, in order maximise the models’ performance. The Gaussian Decision Boundary Estimation based support-vector-machine model is trained with these features and finds the best hyperplane that separates the data into different classes. Throughout this study, promoter regions could be identified with high accuracy 99.9% which is better compared to other state of art algorithms. The comparison of machine learning classification models is another major emphasis of this paper in order to identify the model that most accurately predicts DNA sequence promoters. It provides analysis for further biological research as well as precision medicine.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
大万发布了新的文献求助10
2秒前
李伟峰完成签到,获得积分10
2秒前
2秒前
默存完成签到,获得积分10
3秒前
一十六发布了新的文献求助10
3秒前
3秒前
小二郎应助whatbird采纳,获得10
3秒前
4秒前
白昼七七完成签到,获得积分10
4秒前
4秒前
4秒前
beiyoumilu完成签到,获得积分10
5秒前
5秒前
沉静的不悔应助机灵如雪采纳,获得10
6秒前
烟花应助pin采纳,获得10
6秒前
我是老大应助十三采纳,获得10
6秒前
6秒前
李健应助十三采纳,获得10
6秒前
灵巧如花应助十三采纳,获得10
6秒前
大万完成签到,获得积分10
7秒前
actor2006完成签到,获得积分10
7秒前
HTT发布了新的文献求助10
7秒前
存不住发布了新的文献求助10
7秒前
彭于晏应助栉风风风采纳,获得10
8秒前
长情茗茗发布了新的文献求助10
9秒前
一十六完成签到,获得积分10
9秒前
10秒前
wanci应助科研通管家采纳,获得10
10秒前
10秒前
10秒前
年过半摆应助科研通管家采纳,获得30
10秒前
10秒前
10秒前
10秒前
丘比特应助科研通管家采纳,获得10
10秒前
10秒前
桐桐应助科研通管家采纳,获得10
10秒前
10秒前
超帅凝芙完成签到,获得积分20
10秒前
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Introduction to Helicopter and Tiltrotor Flight Simulation, Second Edition 2500
卤化钙钛矿人工突触的研究 2000
History of U.S. Space Surveillance and Satellite Cataloging 1000
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
Materials selection in mechanical design 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6505741
求助须知:如何正确求助?哪些是违规求助? 8299599
关于积分的说明 17717093
捐赠科研通 5605860
什么是DOI,文献DOI怎么找? 2920319
邀请新用户注册赠送积分活动 1897636
关于科研通互助平台的介绍 1759871