An Effective Algorithm Based on Sequence and Property Information for N4-methylcytosine Identification in Multiple Species

鉴定(生物学) 序列(生物学) 5-甲基胞嘧啶 财产(哲学) 化学 算法 计算生物学 计算机科学 生物化学 生物 基因 DNA甲基化 植物 基因表达 哲学 认识论
作者
Lichao Zhang,Xueting Wang,Kang Xiao,Liang Kong
出处
期刊:Letters in Organic Chemistry [Bentham Science Publishers]
卷期号:21 (8): 695-706
标识
DOI:10.2174/0115701786277281231228093405
摘要

Abstract: N4-methylcytosine (4mC) is one of the most important epigenetic modifications, which plays a significant role in biological progress and helps explain biological functions. Although biological experiments can identify potential 4mC sites, they are limited due to the experimental environment and labor-intensive process. Therefore, it is crucial to construct a computational model to identify the 4mC sites. Some computational methods have been proposed to identify the 4mC sites, but some problems should not be ignored, such as those presented as follows: (1) a more accurate algorithm is required to improve the prediction, especially for Matthew’s correlation coefficient (MCC); (2) easier method is needed for clinical research to design medicine or treat disease. Considering these aspects, an effective algorithm using comprehensible encoding in multiple species was proposed in this study. Since nucleotide arrangement and its property information could reflect the sequence structure and function, several feature vectors have been developed based on nucleotide energy information, trinucleotide energy information, and nucleotide chemical property information. Besides, feature effect has been analyzed to select the optimal feature vectors for multiple species. Finally, the optimal feature vectors were inputted into the CatBoost algorithm to construct the identification model. The evaluation results showed that our study obtained the highest MCC, i.e., 2.5%~11.1%, 1.4%~17.8%, 1.1%~7.6%, and 2.3%~18.0% higher than previous models for the A. thaliana, C. elegans, D. melanogaster, and E. coli datasets, respectively. These satisfactory results reflect that the proposed method is available to identify 4mC sites in multiple species, especially for MCC. It could provide a reasonable supplement for biological research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
研友_VZG7GZ应助sky采纳,获得10
1秒前
在水一方应助蓝豆子采纳,获得10
1秒前
动漫大师发布了新的文献求助10
1秒前
hym发布了新的文献求助10
1秒前
2秒前
2秒前
cdercder应助caohuijun采纳,获得10
3秒前
3秒前
闹闹加油发布了新的文献求助10
3秒前
丘比特应助天欲飞霜采纳,获得10
3秒前
3秒前
moyan发布了新的文献求助10
4秒前
尛瞐慶成发布了新的文献求助10
4秒前
研友_VZG7GZ应助aaa采纳,获得10
4秒前
6秒前
蓝冰香筱发布了新的文献求助10
6秒前
BIUBIU发布了新的文献求助10
6秒前
freeaway完成签到,获得积分10
7秒前
WN发布了新的文献求助10
7秒前
旭龙发布了新的文献求助10
9秒前
科研通AI5应助顾志成采纳,获得10
10秒前
10秒前
徐伟康完成签到 ,获得积分0
10秒前
七十三度发布了新的文献求助10
11秒前
wzgkeyantong发布了新的文献求助30
11秒前
11秒前
SciGPT应助wanghuan采纳,获得30
11秒前
sky完成签到,获得积分10
14秒前
14秒前
ding应助枫竹采纳,获得10
14秒前
16秒前
16秒前
sky发布了新的文献求助10
16秒前
17秒前
帅气老虎发布了新的文献求助10
17秒前
17秒前
七慕凉完成签到,获得积分10
18秒前
wzgkeyantong完成签到,获得积分10
18秒前
wanci应助ljs采纳,获得10
18秒前
翔96完成签到,获得积分10
18秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Mobilization, center-periphery structures and nation-building 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
Functional Polyimide Dielectrics: Structure, Properties, and Applications 450
Technologies supporting mass customization of apparel: A pilot project 450
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Multichannel rotary joints-How they work 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3795186
求助须知:如何正确求助?哪些是违规求助? 3340148
关于积分的说明 10298847
捐赠科研通 3056613
什么是DOI,文献DOI怎么找? 1677114
邀请新用户注册赠送积分活动 805194
科研通“疑难数据库(出版商)”最低求助积分说明 762391