亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods

自相关 计算机科学 鉴定(生物学) 人工智能 领域(数学) 模式识别(心理学) 蛋白质测序 过程(计算) 计算生物学 数据挖掘 序列(生物学) 特征提取 特征(语言学) 生物 数学 肽序列 遗传学 基因 统计 操作系统 生物化学 哲学 语言学 纯数学 植物
作者
Jun Zhang,Bin Liu
出处
期刊:Current Bioinformatics [Bentham Science]
卷期号:14 (3): 190-199 被引量:115
标识
DOI:10.2174/1574893614666181212102749
摘要

Background: Proteins play a crucial role in life activities, such as catalyzing metabolic reactions, DNA replication, responding to stimuli, etc. Identification of protein structures and functions are critical for both basic research and applications. Because the traditional experiments for studying the structures and functions of proteins are expensive and time consuming, computational approaches are highly desired. In key for computational methods is how to efficiently extract the features from the protein sequences. During the last decade, many powerful feature extraction algorithms have been proposed, significantly promoting the development of the studies of protein structures and functions. Objective: To help the researchers to catch up the recent developments in this important field, in this study, an updated review is given, focusing on the sequence-based feature extractions of protein sequences. Method: These sequence-based features of proteins were grouped into three categories, including composition-based features, autocorrelation-based features and profile-based features. The detailed information of features in each group was introduced, and their advantages and disadvantages were discussed. Besides, some useful tools for generating these features will also be introduced. Results: Generally, autocorrelation-based features outperform composition-based features, and profile-based features outperform autocorrelation-based features. The reason is that profile-based features consider the evolutionary information, which is useful for identification of protein structures and functions. However, profile-based features are more time consuming, because the multiple sequence alignment process is required. Conclusion: In this study, some recently proposed sequence-based features were introduced and discussed, such as basic k-mers, PseAAC, auto-cross covariance, top-n-gram etc. These features did make great contributions to the developments of protein sequence analysis. Future studies can be focus on exploring the combinations of these features. Besides, techniques from other fields, such as signal processing, natural language process (NLP), image processing etc., would also contribute to this important field, because natural languages (such as English) and protein sequences share some similarities. Therefore, the proteins can be treated as documents, and the features, such as k-mers, top-n-grams, motifs, can be treated as the words in the languages. Techniques from these filed will give some new ideas and strategies for extracting the features from proteins.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
6秒前
jhlz5879完成签到 ,获得积分10
37秒前
CynthiaaaCat完成签到,获得积分10
45秒前
48秒前
49秒前
1分钟前
1分钟前
有人举报咔咔求助涉嫌违规
2分钟前
wenbo完成签到,获得积分10
2分钟前
2分钟前
肆肆完成签到,获得积分10
2分钟前
2分钟前
2分钟前
苦逼的医学生陳完成签到 ,获得积分10
2分钟前
共享精神应助科研通管家采纳,获得10
2分钟前
3分钟前
3分钟前
3分钟前
Flower完成签到 ,获得积分10
3分钟前
3分钟前
有人举报岁岁平安求助涉嫌违规
3分钟前
朗源Wu发布了新的文献求助10
3分钟前
3分钟前
寻道图强应助其乐融融采纳,获得10
4分钟前
4分钟前
4分钟前
4分钟前
4分钟前
5分钟前
5分钟前
寻道图强应助Sandy采纳,获得30
5分钟前
5分钟前
6分钟前
Jing发布了新的文献求助10
6分钟前
有人举报HU求助涉嫌违规
6分钟前
其乐融融发布了新的文献求助30
6分钟前
寻道图强应助断罪残影采纳,获得10
6分钟前
6分钟前
有人举报光亮的笑白求助涉嫌违规
6分钟前
7分钟前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Sport in der Antike 800
Aspect and Predication: The Semantics of Argument Structure 666
De arte gymnastica. The art of gymnastics 600
少脉山油柑叶的化学成分研究 530
Electronic Structure Calculations and Structure-Property Relationships on Aromatic Nitro Compounds 500
Berns Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2412610
求助须知:如何正确求助?哪些是违规求助? 2106944
关于积分的说明 5324388
捐赠科研通 1834446
什么是DOI,文献DOI怎么找? 913952
版权声明 560922
科研通“疑难数据库(出版商)”最低求助积分说明 488748