An Effective Algorithm Based on Sequence and Property Information for N4-methylcytosine Identification in Multiple Species

鉴定(生物学) 序列(生物学) 5-甲基胞嘧啶 财产(哲学) 化学 算法 计算生物学 计算机科学 生物化学 生物 基因 DNA甲基化 植物 基因表达 哲学 认识论
作者
Lichao Zhang,Xueting Wang,Kang Xiao,Liang Kong
出处
期刊:Letters in Organic Chemistry [Bentham Science Publishers]
卷期号:21 (8): 695-706
标识
DOI:10.2174/0115701786277281231228093405
摘要

Abstract: N4-methylcytosine (4mC) is one of the most important epigenetic modifications, which plays a significant role in biological progress and helps explain biological functions. Although biological experiments can identify potential 4mC sites, they are limited due to the experimental environment and labor-intensive process. Therefore, it is crucial to construct a computational model to identify the 4mC sites. Some computational methods have been proposed to identify the 4mC sites, but some problems should not be ignored, such as those presented as follows: (1) a more accurate algorithm is required to improve the prediction, especially for Matthew’s correlation coefficient (MCC); (2) easier method is needed for clinical research to design medicine or treat disease. Considering these aspects, an effective algorithm using comprehensible encoding in multiple species was proposed in this study. Since nucleotide arrangement and its property information could reflect the sequence structure and function, several feature vectors have been developed based on nucleotide energy information, trinucleotide energy information, and nucleotide chemical property information. Besides, feature effect has been analyzed to select the optimal feature vectors for multiple species. Finally, the optimal feature vectors were inputted into the CatBoost algorithm to construct the identification model. The evaluation results showed that our study obtained the highest MCC, i.e., 2.5%~11.1%, 1.4%~17.8%, 1.1%~7.6%, and 2.3%~18.0% higher than previous models for the A. thaliana, C. elegans, D. melanogaster, and E. coli datasets, respectively. These satisfactory results reflect that the proposed method is available to identify 4mC sites in multiple species, especially for MCC. It could provide a reasonable supplement for biological research.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
sky11完成签到,获得积分10
1秒前
鱼鱼鱼完成签到,获得积分10
1秒前
李垣锦完成签到 ,获得积分10
4秒前
甜甜长颈鹿完成签到,获得积分10
6秒前
天天快乐应助阔达的凡采纳,获得10
6秒前
维尼完成签到,获得积分20
6秒前
6秒前
7秒前
dde应助逐日者2015采纳,获得10
7秒前
英俊的铭应助小六子采纳,获得10
7秒前
NexusExplorer应助铜锣烧采纳,获得10
8秒前
9秒前
9秒前
10秒前
wow发布了新的文献求助10
11秒前
活力小蚂蚁完成签到 ,获得积分10
12秒前
14秒前
LiangQixin完成签到,获得积分10
14秒前
1爱3给1爱3的求助进行了留言
14秒前
cdercder应助wilson采纳,获得30
15秒前
喜悦的铭完成签到,获得积分10
15秒前
15秒前
16秒前
金金金完成签到,获得积分10
17秒前
17秒前
慕青应助gu采纳,获得10
17秒前
麻薯太好吃了完成签到,获得积分10
17秒前
quantum完成签到,获得积分10
19秒前
魔法少女伊莉雅完成签到,获得积分10
19秒前
MMM完成签到 ,获得积分10
19秒前
null发布了新的文献求助10
19秒前
19秒前
ce发布了新的文献求助10
20秒前
Akim应助与落采纳,获得10
21秒前
21秒前
21秒前
22秒前
22秒前
23秒前
Jery完成签到,获得积分10
23秒前
高分求助中
The Graphene Handbook (2019 Edition) 800
Signals, Systems, and Signal Processing 610
IEST-RP-CC018: Cleanroom Cleaning and Sanitization: Operating and Monitoring Procedures 600
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
久松真一著作集〈第5巻〉禅と芸術 500
Fundamentals of Modern Mathematics: A Practical Review (Dover Books on Mathematics) 500
Cold War Transcended: Australia's China Policy, 1949-1990 470
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6600518
求助须知:如何正确求助?哪些是违规求助? 8369414
关于积分的说明 17913449
捐赠科研通 5755837
什么是DOI,文献DOI怎么找? 2954467
邀请新用户注册赠送积分活动 1929611
关于科研通互助平台的介绍 1825299