Descriptor generation from Morgan fingerprint using persistent homology

过度拟合 人工智能 回归 模式识别(心理学) 主成分分析 计算机科学 持久同源性 化学信息学 数学 机器学习 统计 算法 人工神经网络 生物 生物信息学
作者
Takuya Ehiro
出处
期刊:Sar and Qsar in Environmental Research [Taylor & Francis]
卷期号:35 (1): 31-51 被引量:2
标识
DOI:10.1080/1062936x.2023.2301327
摘要

In cheminformatics, molecular fingerprints (FPs) are used in various tasks such as regression and classification. However, predictive models often underutilize Morgan FP for regression and related tasks in machine learning. This study introduced descriptors derived from reshaped Morgan FPs using persistent homology for the predictive accuracy improvement. In the solvation free energy (FreeSolv) and water solubility (ESOL) datasets, persistent homology was found to enhance predictive accuracy compared to the use of only Morgan FPs. Notably, using the first-order persistence diagram (PD1) for descriptor generation resulted in more significant improvements than using the zeroth-order persistence diagram (PD0). Combining 4096 bits Morgan FPs with PD1-generated descriptors increased the average coefficient of determination in the Gaussian process regression from 0.597 to 0.667 for FreeSolv and from 0.629 to 0.654 for ESOL. Adjusting the grid size parameter during PD-based descriptor generation is crucial, as finer grids, especially with PD0, generate more descriptors but reduce predictive accuracy. Coarsening the grid or applying principal component analysis (PCA) mitigates overfitting and enhances accuracy. When descriptors were generated from Morgan FPs with randomly shuffled bit positions, coarsening the grid and/or applying PCA achieved similar accuracy improvements as when the persistent homology of the original Morgan FPs was used.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
闪闪可乐发布了新的文献求助10
刚刚
1秒前
1秒前
nnqq发布了新的文献求助10
2秒前
双楠发布了新的文献求助10
2秒前
4秒前
橘子味雪糕完成签到,获得积分10
4秒前
5秒前
5秒前
6秒前
6秒前
6秒前
6秒前
善学以致用应助elysia采纳,获得10
7秒前
烟花应助公冶笑白采纳,获得10
7秒前
paramecium86发布了新的文献求助10
7秒前
bkagyin应助PanCiro采纳,获得10
9秒前
9秒前
9秒前
Mengmeng发布了新的文献求助10
10秒前
现实马里奥完成签到,获得积分10
11秒前
潘善若发布了新的文献求助10
11秒前
nnqq完成签到,获得积分10
11秒前
11秒前
liangyuting发布了新的文献求助10
12秒前
Xiaoyu发布了新的文献求助10
13秒前
13秒前
lym54发布了新的文献求助10
14秒前
乐乐应助潘善若采纳,获得10
14秒前
明理的从波完成签到,获得积分10
15秒前
kzf完成签到,获得积分10
16秒前
王博士完成签到,获得积分10
17秒前
体贴的青烟完成签到,获得积分10
17秒前
17秒前
坚强的代曼完成签到,获得积分10
18秒前
19秒前
19秒前
20秒前
顾矜应助vivi采纳,获得30
21秒前
21秒前
高分求助中
Thinking Small and Large 500
Algorithmic Mathematics in Machine Learning 500
Single Element Semiconductors: Properties and Devices 300
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Parallel Optimization 200
Artificial bee colony algorithm 200
Deciphering Earth's History: the Practice of Stratigraphy 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3835256
求助须知:如何正确求助?哪些是违规求助? 3377691
关于积分的说明 10500085
捐赠科研通 3097330
什么是DOI,文献DOI怎么找? 1705674
邀请新用户注册赠送积分活动 820660
科研通“疑难数据库(出版商)”最低求助积分说明 772174