Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs

药物发现 随机森林 小分子 数量结构-活动关系 人工神经网络 人工智能 计算生物学 计算机科学 化学 机器学习 生物系统 生物 生物化学
作者
Kasper A. Einarson,Kristian Moss Bendtsen,Kang Li,Maria Thomsen,Niels Rode Kristensen,Ole Winther,Simone Fulle,Line Katrine Harder Clemmensen,Hanne H. F. Refsgaard
出处
期刊:ACS omega [American Chemical Society]
卷期号:8 (26): 23566-23578 被引量:4
标识
DOI:10.1021/acsomega.3c01218
摘要

Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
黄汉良完成签到,获得积分10
1秒前
晨晨lili完成签到,获得积分10
2秒前
晓晓发布了新的文献求助10
2秒前
搜集达人应助两颗星采纳,获得10
2秒前
3秒前
烟花应助mimiya采纳,获得10
3秒前
隐形曼青应助小葵采纳,获得10
4秒前
无头骑士完成签到,获得积分10
6秒前
ccm应助yulinhai采纳,获得10
8秒前
量子星尘发布了新的文献求助10
8秒前
9秒前
9秒前
9秒前
Romme完成签到,获得积分10
10秒前
10秒前
11秒前
Gazelledeer完成签到,获得积分10
11秒前
无极微光应助shtnice采纳,获得20
11秒前
坚定的小馒头完成签到 ,获得积分10
12秒前
渡花应助cc采纳,获得10
13秒前
CipherSage应助cc采纳,获得10
13秒前
往事吴痕发布了新的文献求助10
13秒前
细胞在江山在给细胞在江山在的求助进行了留言
14秒前
甜豆包完成签到 ,获得积分10
15秒前
华花花发布了新的文献求助10
15秒前
15秒前
脈打发布了新的文献求助10
15秒前
17秒前
LLSSLL完成签到,获得积分10
17秒前
科研通AI6应助ZZ_采纳,获得10
17秒前
18秒前
18秒前
田様应助爱我嫉妒我采纳,获得10
20秒前
mimiya发布了新的文献求助10
20秒前
田様应助脈打采纳,获得10
21秒前
生动的驳完成签到,获得积分20
22秒前
盛小铃发布了新的文献求助10
23秒前
充电宝应助大力的诗蕾采纳,获得10
23秒前
WHITE完成签到,获得积分10
23秒前
黄bb应助晚香玉采纳,获得10
24秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Reproduction Third Edition 3000
Comprehensive Methanol Science Production, Applications, and Emerging Technologies 2000
化妆品原料学 1000
《药学类医疗服务价格项目立项指南(征求意见稿)》 1000
1st Edition Sports Rehabilitation and Training Multidisciplinary Perspectives By Richard Moss, Adam Gledhill 600
nephSAP® Nephrology Self-Assessment Program - Hypertension The American Society of Nephrology 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5632465
求助须知:如何正确求助?哪些是违规求助? 4726925
关于积分的说明 14982122
捐赠科研通 4790432
什么是DOI,文献DOI怎么找? 2558280
邀请新用户注册赠送积分活动 1518679
关于科研通互助平台的介绍 1479141