Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling

数量结构-活动关系 人工智能 氨基酸残基 计算机科学 化学信息学 机器学习 数学 计算生物学 肽序列 化学 生物化学 生物 计算化学 基因
作者
Peng Zhou,Qian Liu,Ting Wu,Qingqing Miao,Shuyong Shang,Heyi Wang,Zheng Chen,Shaozhou Wang,Heyan Wang
出处
期刊:Journal of Chemical Information and Modeling [American Chemical Society]
卷期号:61 (4): 1718-1731 被引量:81
标识
DOI:10.1021/acs.jcim.0c01370
摘要

The peptide quantitative structure–activity relationship (QSAR), also known as the quantitative sequence–activity model (QSAM), has attracted much attention in the bio- and chemoinformatics communities and is a well developed computational peptidology strategy to statistically correlate the sequence/structure and activity/property relationships of functional peptides. Amino acid descriptors (AADs) are one of the most widely used methods to characterize peptide structures by decomposing the peptide into its residue building blocks and sequentially parametrizing each building block with a vector of amino acid principal properties. Considering that various AADs have been proposed over the past decades and new AADs are still emerging today, we herein query the following: is it necessary to develop so many AADs and do we need to continuously develop more new AADs? In this study, we exhaustively collect 80 published AADs and comprehensively evaluate their modeling performance (including fitting ability, internal stability, and predictive power) on 8 QSAR-oriented peptide sample sets (QPSs) by employing 2 sophisticated machine learning methods (MLMs), totally building and systematically comparing 1280 (80 AADs × 8 QPSs × 2 MLMs) peptide QSAR models. The following is revealed: (i) None of the AADs can work best on all or most peptide sets; an AAD usually performs well for some peptides but badly for others. (ii) Modeling performance is primarily determined by the peptide samples and then the MLMs used, while AADs have only a moderate influence on the performance. (iii) There is no essential difference between the modeling performances of different AAD types (physiochemical, topological, 3D-structural, etc.). (iv) Two random descriptors, which are separately generated randomly in standard normal distribution N(0, 1) and uniform distribution U(−1, +1), do not perform significantly worse than these carefully developed AADs. (v) A secondary descriptor, which carries major information involved in the 80 (primary) AADs, does not perform significantly better than these AADs. Overall, we conclude that since there are various AADs available to date and they already cover numerous amino acid properties, further development of new AADs is not an essential choice to improve peptide QSAR modeling; the traditional AAD methodology is believed to have almost reached the theoretical limit nowadays. In addition, the AADs are more likely to be a vector symbol but not informative data; they are utilized to mark and distinguish the 20 amino acids but do not really bring much original property information to these amino acids.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
1秒前
王旺旺发布了新的文献求助10
1秒前
嘉平三十发布了新的文献求助10
1秒前
1秒前
852应助解文哲采纳,获得10
2秒前
爱听歌的糖豆完成签到,获得积分0
2秒前
yum发布了新的文献求助30
2秒前
wangayting发布了新的文献求助10
2秒前
SHI完成签到,获得积分10
3秒前
动人的采萱完成签到,获得积分10
3秒前
不喝汽水完成签到,获得积分10
3秒前
wantong发布了新的文献求助10
4秒前
123完成签到,获得积分10
5秒前
lacan完成签到,获得积分20
5秒前
Miya发布了新的文献求助30
5秒前
5秒前
dandan发布了新的文献求助10
5秒前
6秒前
Sebastian完成签到,获得积分10
6秒前
6秒前
小蘑菇应助小超人采纳,获得10
6秒前
6秒前
6秒前
7秒前
7秒前
牛牛超人完成签到,获得积分10
8秒前
Zhou完成签到,获得积分10
8秒前
lin完成签到 ,获得积分10
8秒前
Kevin Huang发布了新的文献求助10
8秒前
平淡纸飞机完成签到 ,获得积分10
9秒前
xiaobai完成签到,获得积分10
9秒前
笨笨梦松完成签到,获得积分10
9秒前
11秒前
檀宇亭完成签到,获得积分10
11秒前
yuyuyu发布了新的文献求助10
11秒前
Criminology34应助xstar采纳,获得10
12秒前
12秒前
雪雪啊发布了新的文献求助10
12秒前
高分求助中
晶体学对称群—如何读懂和应用国际晶体学表 1500
Problem based learning 1000
Constitutional and Administrative Law 1000
Microbially Influenced Corrosion of Materials 500
Die Fliegen der Palaearktischen Region. Familie 64 g: Larvaevorinae (Tachininae). 1975 500
Numerical controlled progressive forming as dieless forming 400
Rural Geographies People, Place and the Countryside 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5388481
求助须知:如何正确求助?哪些是违规求助? 4510609
关于积分的说明 14035848
捐赠科研通 4421354
什么是DOI,文献DOI怎么找? 2428772
邀请新用户注册赠送积分活动 1421347
关于科研通互助平台的介绍 1400559