Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review

计算机科学 药物发现 稀缺 数据科学 电流(流体) 人工智能 机器学习 生物信息学 工程类 生物 电气工程 经济 微观经济学
作者
Amit Gangwal,Azim Ansari,Iqrar Ahmad,Abul Kalam Azad,Wan Mohd Azizi Wan Sulaiman
出处
期刊:Computers in Biology and Medicine [Elsevier BV]
卷期号:179: 108734-108734 被引量:26
标识
DOI:10.1016/j.compbiomed.2024.108734
摘要

Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
alile发布了新的文献求助10
刚刚
1秒前
科研通AI5应助西奥采纳,获得10
1秒前
1秒前
1秒前
wjy发布了新的文献求助50
1秒前
酷波er应助Fu采纳,获得10
2秒前
2秒前
2秒前
2秒前
嗯哼发布了新的文献求助10
2秒前
3秒前
3秒前
4秒前
安心完成签到,获得积分10
4秒前
Japrin完成签到,获得积分10
5秒前
简单平蓝发布了新的文献求助10
5秒前
烂漫青槐应助落寞的发卡采纳,获得10
5秒前
6秒前
6秒前
CodeCraft应助通~采纳,获得10
6秒前
6秒前
liuminghui完成签到 ,获得积分20
6秒前
Evan Wang完成签到,获得积分10
7秒前
7秒前
8秒前
8秒前
于胜男完成签到,获得积分10
8秒前
8秒前
8秒前
9秒前
9秒前
9秒前
邹秋雨完成签到,获得积分20
10秒前
10秒前
嘀嘀嘀完成签到 ,获得积分10
10秒前
10秒前
10秒前
11秒前
高分求助中
Thinking Small and Large 500
Algorithmic Mathematics in Machine Learning 500
Handbook of Innovations in Political Psychology 400
Mapping the Stars: Celebrity, Metonymy, and the Networked Politics of Identity 400
Visceral obesity is associated with clinical and inflammatory features of asthma: A prospective cohort study 300
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Engineering the boosting of the magnetic Purcell factor with a composite structure based on nanodisk and ring resonators 240
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3838966
求助须知:如何正确求助?哪些是违规求助? 3381420
关于积分的说明 10518123
捐赠科研通 3100845
什么是DOI,文献DOI怎么找? 1707788
邀请新用户注册赠送积分活动 821928
科研通“疑难数据库(出版商)”最低求助积分说明 773056