亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Defining and Exploring Chemical Spaces

化学空间 计算机科学 操作化 贝叶斯优化 财产(哲学) 过程(计算) 人工智能 机器学习 生化工程 数据科学 药物发现 生物信息学 工程类 哲学 认识论 生物 操作系统
作者
Connor W. Coley
出处
期刊:Trends in chemistry [Elsevier]
卷期号:3 (2): 133-145 被引量:123
标识
DOI:10.1016/j.trechm.2020.11.004
摘要

Virtual libraries used in molecular discovery are often too large to exhaustively evaluate, warranting the use of algorithms to help with exploration.Algorithmic approaches like Bayesian optimization can help to efficiently navigate predefined chemical spaces in combination with surrogate models.On-the-fly molecular generation during exploration enables even larger chemical spaces to be searched, including deep-learning-based models, although their chemical spaces are defined only implicitly.Emerging approaches to incorporate reactions into machine-learning-based generation can ensure that molecules are able to be synthesized, similar to previously developed algorithms for reaction-based de novo design. Designing functional molecules with desirable properties is often a challenging, multi-objective optimization. For decades, there have been computational approaches to facilitate this process through the simulation of physical processes, the prediction of molecular properties using structure–property relationships, and the selection or generation of molecular structures. This review provides an overview of some algorithmic approaches to defining and exploring chemical spaces that have the potential to operationalize the process of molecular discovery. We emphasize the potential roles of machine learning and the consideration of synthetic feasibility, which is a prerequisite to 'closing the loop'. We conclude by summarizing important directions for the future development and evaluation of these methods. Designing functional molecules with desirable properties is often a challenging, multi-objective optimization. For decades, there have been computational approaches to facilitate this process through the simulation of physical processes, the prediction of molecular properties using structure–property relationships, and the selection or generation of molecular structures. This review provides an overview of some algorithmic approaches to defining and exploring chemical spaces that have the potential to operationalize the process of molecular discovery. We emphasize the potential roles of machine learning and the consideration of synthetic feasibility, which is a prerequisite to 'closing the loop'. We conclude by summarizing important directions for the future development and evaluation of these methods. Chemical space can be thought of as the set of all possible molecules or materials. We generally consider more narrowly defined chemical spaces that are defined or constrained by the structures or functions of the molecules they contain. For example, 'drug-like chemical space' is used in the context of drug discovery to reflect the vast number of molecules that have physical properties similar to those of existing small-molecule therapeutics. While quantifying the size of a chemical is rarely useful, it should be noted that there are far more organic molecules thought to be stable than atoms in the solar system, which is unsurprising given the combinatorics of designing molecular graphs. Here, we focus our discussion on small molecules rather than periodic materials, biomolecules, and polymers, all of which correspond to distinct 'chemical spaces'. Many studies have estimated the size of different chemical spaces [1.Bohacek R.S. et al.The art and practice of structure-based drug design: a molecular modeling perspective.Med. Res. Rev. 1996; 16: 3-50Crossref PubMed Scopus (774) Google Scholar, 2.Drew K.L.M. et al.Size estimation of chemical space: how big is it?.J. Pharm. Pharmacol. 2012; 64: 490-495Crossref PubMed Scopus (31) Google Scholar, 3.Polishchuk P.G. et al.Estimation of the size of drug-like chemical space based on GDB-17 data.J. Comput. Aided Mol. Des. 2013; 27: 675-679Crossref PubMed Scopus (201) Google Scholar] and suggested rules to organize these spaces along functional axes to improve their visualization and navigability [4.Oprea T.I. Gottfries J. Chemography: the art of navigating in chemical space.J. Comb. Chem. 2001; 3: 157-166Crossref PubMed Scopus (285) Google Scholar, 5.Reymond J.-L. Awale M. Exploring chemical space for drug discovery using the Chemical Universe database.ACS Chem. Neurosci. 2012; 3: 649-657Crossref PubMed Scopus (173) Google Scholar, 6.Awale M. Reymond J.-L. Web-based 3D-visualization of the DrugBank chemical space.J. Cheminform. 2016; 8: 25Crossref PubMed Scopus (10) Google Scholar, 7.Probst D. Reymond J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees.J. Cheminform. 2020; 12: 12Crossref PubMed Scopus (65) Google Scholar]. As we have described previously, the discovery of novel molecules can be framed as a search within chemical space [8.Coley C.W. et al.Autonomous discovery in the chemical sciences part I: progress.Angew. Chem. Int. Ed. 2019; (Published online September 25, 2019. https://doi.org/10.1002/anie.201909987)Google Scholar,9.Coley C.W. et al.Autonomous discovery in the chemical sciences part II: outlook.Angew. Chem. Int. Ed. 2019; (Published online September 25, 2019. https://doi.org/10.1002/anie.201909989)Google Scholar]. The goal is to identify one or more molecules that exhibit a set of desirable properties. Besides defining these properties and a strategy to evaluate candidate molecules, the two primary considerations one must make are: (i) how to define the space; and (ii) how to explore the space. Both contribute to the search efficiency and likelihood of finding a good candidate. These two aspects are not independent: if you are repurposing FDA-approved drugs, your chemical space is narrow enough that an exhaustive screen may be feasible, but if you have no such restriction you must employ some strategy to select which molecules to test. These strategies are typically iterative optimization routines (driven by human intuition or driven by quantitative experimental design) with varying degrees of sophistication, as discussed later. Navigating chemical space has been extensively written about in the context of (non-algorithmic) drug design [10.Dobson C.M. Chemical space and biology.Nature. 2004; 432: 824-828Crossref PubMed Scopus (717) Google Scholar,11.Lipinski C. Hopkins A. Navigating chemical space for biology and medicine.Nature. 2004; 432: 855-861Crossref PubMed Scopus (769) Google Scholar]. The number of candidate molecules is too large to explore exhaustively, so one often imposes constraints on chemical space depending on the search strategy, the application, and the practical limitations of cost and time. These constraints look quite different when candidates are evaluated by physical rather than computational experiments. In the former case, acquiring new information about the performance of a molecule requires its physical synthesis, purification, and characterization; considerations of synthesis cost and material availability are paramount. In the latter case, one may postpone these practical considerations until after computational evaluations have identified a putative 'optimal' molecule. To bound the computational cost, the search space is still restricted using human expertise or some 'prior' on what would make a viable candidate. This review examines strategies to define and explore chemical spaces with an emphasis on the role of machine learning and synthesizability constraints (Table 1, Key Table). While this can be performed by subject-matter experts (e.g., medicinal chemists) in the absence of computer assistance, formalizing these concepts may eventually enable autonomous workflows to produce novel, useful outcomes with reduced reliance on human intuition and subjectivity. Elements of the concepts we cover can be found in previous articles, including a recent overview by Lemonick [12.Lemonick S. Exploring chemical space: can AI take us where no human has gone before?.Chem. Eng. News. 2020; 98: 30Google Scholar]. We do not address visualization and instead refer readers to the work of Reymond and coworkers [5.Reymond J.-L. Awale M. Exploring chemical space for drug discovery using the Chemical Universe database.ACS Chem. Neurosci. 2012; 3: 649-657Crossref PubMed Scopus (173) Google Scholar,7.Probst D. Reymond J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees.J. Cheminform. 2020; 12: 12Crossref PubMed Scopus (65) Google Scholar].Table 1Key Table. Categorization of Approaches to Define Chemical Spaces for Molecular Discovery and an Incomplete Set of Examples for EachaSpaces can be defined prior to exploration or defined on the fly by evolutionary and/or machine learning-based methods. They can be relatively unconstrained (i.e., only in terms of validity) or constrained by availability (i.e., in terms of purchasability or synthesizability).UnconstrainedConstrainedPredefinedZINC [13.Irwin J.J. et al.ZINC: a free tool to discover chemistry for biology.J. Chem. Inf. Model. 2012; 52: 1757-1768Crossref PubMed Scopus (1646) Google Scholar], ChEMBL [15.Gaulton A. et al.ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res. 2012; 40: D1100-D1107Crossref PubMed Scopus (2302) Google Scholar], PubChem [14.Kim S. et al.PubChem 2019 update: improved access to chemical data.Nucleic Acids Res. 2019; 47: D1102-D1109Crossref PubMed Scopus (1440) Google Scholar], GDB [24.Reymond J.-L. The Chemical Space Project.Acc. Chem. Res. 2015; 48: 722-730Crossref PubMed Scopus (266) Google Scholar]DrugBank [16.Wishart D.S. et al.DrugBank: a comprehensive resource for in silico drug discovery and exploration.Nucleic Acids Res. 2006; 34: D668-D672Crossref PubMed Scopus (2338) Google Scholar], Enamine REAL (https://enamine.net/library-synthesis/real-compounds), WuXi Virtual Library (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual), SAVI [32.Patel H. et al.Synthetically Accessible Virtual Inventory (SAVI).ChemRxiv. 2020; (Published online April 27, 2020. https://doi.org/10.26434/chemrxiv.12185559)Google Scholar], PGVL [33.Hu Q. et al.LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically.Methods Mol. Biol. 2011; 685: 253-276Crossref PubMed Scopus (29) Google Scholar], PLC [34.Nicolaou C.A. et al.The Proximal Lilly Collection: mapping, exploring and exploiting feasible chemical space.J. Chem. Inf. Model. 2016; 56: 1253-1266Crossref PubMed Scopus (48) Google Scholar]On the fly via heuristic methodsFragment-based GAs [57.Venkatasubramanian V. et al.Computer-aided molecular design using genetic algorithms.Comput. Chem. Eng. 1994; 18: 833-844Crossref Scopus (192) Google Scholar], GroupBuild [66.Rotstein S.H. Murcko M.A. GroupBuild: a fragment-based method for de novo drug design.J. Med. Chem. 1993; 36: 1700-1710Crossref PubMed Scopus (170) Google Scholar], BREED [58.Pierce A.C. et al.BREED: generating novel inhibitors through hybridization of known ligands. Application to CDK2, P38, and HIV protease.J. Med. Chem. 2004; 47: 2768-2775Crossref PubMed Scopus (149) Google Scholar], GraphGA [62.Jensen J.H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.Chem. Sci. 2019; 10: 3567-3572Crossref PubMed Google Scholar], GEGL [63.Ahn S. et al.Guiding deep molecular optimization with genetic exploration.arXiv. 2020; (Published online July 4, 2020. http://arxiv.org/abs/2007.04897)Google Scholar]SYNOPSIS [91.Vinkers H.M. et al.SYNOPSIS: SYNthesize and OPtimize System in Silico.J. Med. Chem. 2003; 46: 2765-2773Crossref PubMed Scopus (163) Google Scholar], Flux [88.Fechner U. Schneider G. Flux (1): a virtual synthesis scheme for fragment-based de novo design.J. Chem. Inf. Model. 2006; 46: 699-707Crossref PubMed Scopus (83) Google Scholar], MOARF [89.Firth N.C. et al.MOARF, an integrated workflow for multi-objective optimization: implementation, synthesis, and biological evaluation.J. Chem. Inf. Model. 2015; 55: 1169-1180Crossref PubMed Scopus (24) Google Scholar], DOGS [92.Hartenfeller M. et al.DOGS: reaction-driven de novo design of bioactive compounds.PLoS Comput. Biol. 2012; 8e1002380Crossref PubMed Scopus (155) Google Scholar]On the fly via machine learningSMILES VAE [118.Gomez-Bombarelli R. et al.Automatic chemical design using a data-driven continuous representation of molecules.ACS Cent. Sci. 2018; 4: 268-276Crossref PubMed Scopus (1022) Google Scholar], JT-VAE [75.Jin W. et al.Junction tree variational autoencoder for molecular graph generation.arXiv. 2018; (Published online February 12, 2018. https://arxiv.org/abs/1802.04364)Google Scholar], SMILES RNN [72.Segler M.H.S. et al.Generating focused molecule libraries for drug discovery with recurrent neural networks.ACS Cent. Sci. 2018; 4: 120-131Crossref PubMed Scopus (514) Google Scholar,73.Olivecrona M. et al.Molecular de-novo design through deep reinforcement learning.J. Cheminform. 2017; 9: 48Crossref PubMed Scopus (381) Google Scholar], MolDQN [77.Zhou Z. et al.Optimization of molecules via deep reinforcement learning.arXiv. 2018; (Published online October 19, 2018. http://arxiv.org/abs/1810.08678)Google Scholar]MoleculeChef [96.Bradshaw J. et al.A model to search for synthesizable molecules.arXiv. 2019; (Published online June 12, 2019. http://arxiv.org/abs/1906.05221)Google Scholar], ChemBO [97.Korovina K. ChemBO: Bayesian optimization of small organic molecules with synthesizable recommendations.arXiv. 2019; (Published online August 5, 2019. http://arxiv.org/abs/1908.01425)Google Scholar], PGFS [98.Gottipati S.K. et al.Learning to navigate the synthetically accessible chemical space using reinforcement learning.arXiv. 2020; (Published online April 26, 2020. https://arxiv.org/abs/2004.12485v1)Google Scholar], REACTOR [99.Horwood J. Noutahi E. Molecular design in synthetically accessible chemical space via deep reinforcement learning.arXiv. 2020; (Published online April 29, 2020. https://arxiv.org/abs/2004.14308v1)Google Scholar]a Spaces can be defined prior to exploration or defined on the fly by evolutionary and/or machine learning-based methods. They can be relatively unconstrained (i.e., only in terms of validity) or constrained by availability (i.e., in terms of purchasability or synthesizability). Open table in a new tab One approach to molecular discovery is to explore a predefined chemical space: an enumerated list of candidate molecules. In this setting, the two stages of (i) defining the space and (ii) exploring the space are entirely decoupled. Formally, we might think about this problem as an optimization of an objective function f(x), where x is a molecule belonging to a discrete set X. Defining or selecting a finite chemical space often relies on domain expertise. Careful selection of X can increase the likelihood that it contains a high-performing molecule while minimizing the number of low-performing compounds. Common databases of molecules for computational screening are: ZINC [13.Irwin J.J. et al.ZINC: a free tool to discover chemistry for biology.J. Chem. Inf. Model. 2012; 52: 1757-1768Crossref PubMed Scopus (1646) Google Scholar], a library of commercially available compounds; PubChem [14.Kim S. et al.PubChem 2019 update: improved access to chemical data.Nucleic Acids Res. 2019; 47: D1102-D1109Crossref PubMed Scopus (1440) Google Scholar], molecules with biological relevance; ChEMBL [15.Gaulton A. et al.ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res. 2012; 40: D1100-D1107Crossref PubMed Scopus (2302) Google Scholar], molecules with bioactivity data; and DrugBank [16.Wishart D.S. et al.DrugBank: a comprehensive resource for in silico drug discovery and exploration.Nucleic Acids Res. 2006; 34: D668-D672Crossref PubMed Scopus (2338) Google Scholar], approved or experimental therapeutic molecules. These virtual libraries (see Glossary) all represent 'general-purpose' chemical spaces with broad biological relevance and are therefore applied to many problems related to drug discovery [17.Walters W.P. Virtual chemical libraries.J. Med. Chem. 2019; 62: 1116-1124Crossref PubMed Scopus (83) Google Scholar]. More focused chemical spaces can be created through a domain-informed enumeration of compounds relevant to a specific application; for example, 1.6 million donor-bridge-acceptor trimers for organic electronics [18.Gomez-Bombarelli R. et al.Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach.Nat. Mater. 2016; 15: 1120-1127Crossref PubMed Scopus (509) Google Scholar] or 2.8 million transition-metal complexes for redox flow batteries [19.Janet J.P. et al.Accurate multiobjective design in a space of millions of transition metal complexes with neural-network-driven efficient global optimization.ACS Cent. Sci. 2020; 6: 513-524Crossref PubMed Scopus (60) Google Scholar]. These are exhaustively enumerated chemical spaces with strict constraints on which fragments are included and how they are attached, similar to R-group enumeration methods. Privileged fragments for drug-like molecules have been identified through retrosynthetic analysis and automatic fragmentation [20.Lewell X.Q. et al.RECAP – retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry.J. Chem. Inform. Comput. Sci. 1998; 38: 511-522Crossref PubMed Scopus (534) Google Scholar,21.Ertl P. Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups.J. Chem. Inform. Comput. Sci. 2003; 43: 374-380Crossref PubMed Scopus (219) Google Scholar]; the molecules produced by recombining these fragments are intended to look more promising than an enumeration based on graph structure alone. Graph-theoretical enumeration of molecular structures has been studied for over a century, starting with simple spaces like that of acyclic alkanes [22.Cayley E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen.Ber. Dtsch. Chem. Ges. 1875; 8 (in German): 1056-1059Crossref Scopus (58) Google Scholar,23.Henze H.R. Blair C.M. The number of isomeric hydrocarbons of the methane series.J. Am. Chem. Soc. 1931; 53: 3077-3085Crossref Scopus (77) Google Scholar]. However, it is only recently that these structures have been recorded, evaluated, and used for discovery. The Chemical Space Project exemplifies modern exhaustive enumeration of all stable organic molecules containing common atom types up to a certain size [24.Reymond J.-L. The Chemical Space Project.Acc. Chem. Res. 2015; 48: 722-730Crossref PubMed Scopus (266) Google Scholar]. Since the original Generated DataBase (GDB) of up to seven heavy atoms [25.Fink T. Reymond J.-L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery.J. Chem. Inf. Model. 2007; 47: 342-353PubMed Google Scholar], Reymond and coworkers have enumerated, analyzed, and released the 166.4 billion structures of up to 17 heavy atoms [26.Ruddigkeit L. et al.Enumeration of 166 billion organic small molecules in the Chemical Universe database GDB-17.J. Chem. Inf. Model. 2012; 52: 2864-2875Crossref PubMed Scopus (569) Google Scholar] and published numerous visualizations and analyses thereof. In addition to the benefits of ensuring that X is relevant to the design objective, the predefinition of chemical spaces lets us impose arbitrary constraints on their contents. A practical constraint is the ease of experimental validation: that any candidate can be physically acquired for experimental testing. In the simplest case, a chemical space could be defined as the set of molecules in a company's chemical inventory or vendor catalog. Any compound from this list can be acquired rapidly for experimental evaluation. Accessibility is the primary motivation for make-on-demand libraries, which are chemical spaces defined as the molecules that are in stock or available and all molecules that can be produced from those structures through straightforward synthetic protocols. Libraries are often enumerated by applying a small number (<100) of reaction templates defining common single-step transformations to all possible combinations of starting materials [27.Cramer R.D. et al.Virtual compound libraries: a new approach to decision making in molecular discovery research.J. Chem. Inform. Comput. Sci. 1998; 38: 1010-1023Crossref Scopus (80) Google Scholar, 28.Nikitin S. et al.A very large diversity space of synthetically accessible compounds for use with drug design programs.J. Comput. Aided Mol. Des. 2005; 19: 47-63Crossref PubMed Scopus (31) Google Scholar, 29.Cramer R.D. et al.AllChem: generating and searching 1020 synthetically accessible structures.J. Comput. Aided Mol. Des. 2007; 21: 341-350Crossref PubMed Scopus (44) Google Scholar, 30.Patel H. et al.Knowledge-based approach to de novo design using reaction vectors.J. Chem. Inf. Model. 2009; 49: 1163-1184Crossref PubMed Scopus (61) Google Scholar] (Figure 1); recursive enumeration generates molecules accessible through multiple synthetic steps. There are numerous implementations of this approach [31.Hoffmann T. Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries.Drug Discov. Today. 2019; 24: 1148-1156Crossref PubMed Scopus (82) Google Scholar], including SAVI [32.Patel H. et al.Synthetically Accessible Virtual Inventory (SAVI).ChemRxiv. 2020; (Published online April 27, 2020. https://doi.org/10.26434/chemrxiv.12185559)Google Scholar], efforts within pharmaceutical companies [33.Hu Q. et al.LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically.Methods Mol. Biol. 2011; 685: 253-276Crossref PubMed Scopus (29) Google Scholar,34.Nicolaou C.A. et al.The Proximal Lilly Collection: mapping, exploring and exploiting feasible chemical space.J. Chem. Inf. Model. 2016; 56: 1253-1266Crossref PubMed Scopus (48) Google Scholar], and efforts from commercial vendors (https://enamine.net/library-synthesis/real-compounds; https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). As it becomes impractical to store such large numbers of compounds due to the combinatorial explosion of reaction products, these spaces may be defined implicitly. Whether molecules in these spaces are easy to synthesize depends on the robustness of rules used for enumeration. Lyu and colleagues cite an 86% synthesis success rate for 51 compounds selected from 170 million in the Enamine REAL library enumerated from 130 reaction types; WuXi estimates a 60–80% success rate for their 1.7-billion-member collection generated by 30 reaction types (https://www.labnetwork.com/frontend-app/p/%5C#!/library/virtual). This success rate might be improved through the use of machine-learning models for reaction outcome prediction [35.Coley C.W. et al.A graph-convolutional neural network model for the prediction of chemical reactivity.Chem. Sci. 2019; 10: 370-377Crossref PubMed Google Scholar,36.Schwaller P. et al.Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction.ACS Cent. Sci. 2019; 5: 1572-1583Crossref PubMed Scopus (190) Google Scholar], which for common reaction types exhibit accuracies above 90% on benchmark datasets. These neural models can be directly used to enumerate possible products or used to predict regio/stereoselectivity patterns [37.Tomberg A. et al.A predictive tool for electrophilic aromatic substitutions using machine learning.J. Org. Chem. 2019; 84: 4695-4703Crossref PubMed Scopus (38) Google Scholar, 38.Beker W. et al.Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors.Angew. Chem. Int. Ed. 2019; 58: 4515-4519Crossref PubMed Scopus (63) Google Scholar, 39.Struble T.J. et al.Multitask prediction of site selectivity in aromatic C–H functionalization reactions.React. Chem. Eng. 2020; 5: 896-902Crossref Google Scholar]. Once these spaces are defined, there are several approaches to identify the top-performing molecules within them. The simplest strategy is, of course, to evaluate every candidate molecule. The feasibility of this approach depends on the nature of the evaluation and time/cost constraints. It would not be practical to physically test every compound in the ZINC database, but it could be for smaller collections like the Drug Repurposing Hub [40.Corsello S.M. et al.The Drug Repurposing Hub: a next-generation drug library and information resource.Nat. Med. 2017; 23: 405-408Crossref PubMed Scopus (352) Google Scholar] or the NCATS Pharmaceutical Collection [41.Huang R. et al.The NCATS Pharmaceutical Collection: a 10-year update.Drug Discov. Today. 2019; 24: 2341-2349Crossref PubMed Scopus (25) Google Scholar]. It is worth noting that technologies like DNA-encoded libraries [42.Clark M.A. et al.Design, synthesis and selection of DNA-encoded small-molecule libraries.Nat. Chem. Biol. 2009; 5: 647-654Crossref PubMed Scopus (416) Google Scholar] and phage display [43.Smith G.P. Petrenko V.A. Phage display.Chem. Rev. 1997; 97: 391-410Crossref PubMed Scopus (1352) Google Scholar] can be used to physically screen chemical spaces of trillions of molecules, albeit with a sparse and stochastic readout. If evaluation is computational, practicality is simply a question of computational budget. In one of the largest docking studies reported to date, 138 million and 99 million compounds from the Enamine REAL library were docked against the D4 receptor and AmpC, respectively [44.Lyu J. et al.Ultra large library docking for discovering new chemotypes.Nature. 2019; 566: 224-229Crossref PubMed Scopus (297) Google Scholar]. More recent studies have since screened over 1 billion enumerated molecules from the same database [45.Gorgulla C. et al.An open-source drug discovery platform enables ultra-large virtual screens.Nature. 2020; 580: 663-668Crossref PubMed Scopus (149) Google Scholar,46.Acharya A. et al.Supercomputer-based ensemble docking drug discovery pipeline with application to Covid-19.ChemRxiv. 2020; (Published online July 29, 2020. https://doi.org/10.26434/chemrxiv.12725465.v1)PubMed Google Scholar]. As make-on-demand libraries can exceed this scale by multiple orders of magnitude, we argue that such exhaustive screening techniques are not a viable long-term approach even for inexpensive evaluations like docking. A popular framework to reduce overall cost is active learning through iterative, model-guided optimization [47.Settles B. Active learning.Synth. Lect. Artif. Intell. Mach. Learn. 2012; 6: 1-114Crossref Scopus (625) Google Scholar]. This involves selecting subsets of experiments to perform based on predictions from a quantitative structure–property relationship (QSPR) model: a surrogate model f^(x) that codifies an approximation to f(x). In Bayesian optimization, predictions of performance and model uncertainty are both considered to balance the exploration of uncertain candidates and the exploitation of candidates likely to be high performing [48.Frazier P.I. A tutorial on Bayesian optimization.arXiv. 2018; (Published online July 8, 2018. https://arxiv.org/abs/1807.02811v1)Google Scholar]; simpler optimization schemes may simply perform a greedy search. Examples of this paradigm include the platform Eve for the identification of bioactive molecules [49.Williams K. et al.Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases.J. R. Soc. Interface. 2015; 12: 20141289Crossref PubMed Scopus (59) Google Scholar], retrospective identification of bioactive compounds using PubChem data [50.Kangas J.D. et al.Efficient discovery of responses of proteins to compounds using active learning.BMC Bioinformatics. 2014; 15: 143Crossref PubMed Scopus (23) Google Scholar], computational screening of OLED-relevant molecules [18.Gomez-Bombarelli R. et al.Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach.Nat. Mater. 2016; 15: 1120-1127Crossref PubMed Scopus (509) Google Scholar], and the selection of compounds for docking [51.Gentile F. et al.Deep Docking: a deep learning platform for augmentation of structure based drug discovery.ACS Cent. Sci. 2020; 6: 939-949Crossref PubMed Scopus (80) Google Scholar]. There are still many limitations to be addressed related to the surrogate model, f^, in terms of its low-data performance, generalization power, and ability to quantify uncertainty [52.Muratov E.N. et al.QSAR without borders.Chem. Soc. Rev. 2020; 49: 3525-3564Crossref PubMed Google Scholar], although methods for learning from graph-structured molecules are promising [53.Wu Z. et al.A comprehensive survey on graph neural networks.IEEE Trans. Neural Netw. Learn. Syst. 2020; (Published online March 24, 2020. https://doi.org/10.1109/TNNLS.2020.2978386)Crossref Scopus (951) Google Scholar]. Algorithmic improvements to better handle variable evaluation costs (e.g., the cost of purchasing a compound) and batched optimization (e.g., parallelized in well plates or over multiple CPUs) would be beneficial. While multiple iterations lead to improved surrogate models, a one-iteration approach can still be very effective. A novel antibiotic was recently identified from a drug repurposing collection with fewer experiments than an exhaustive screen this way [54.Stokes J.M. et al.A deep learning approach to antibiotic discovery.Cell. 2020; 180: 688-702.e13Abstract Full Text Full Text PDF
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
林志迎发布了新的文献求助10
1秒前
OKC完成签到,获得积分10
4秒前
13秒前
15秒前
柯语雪发布了新的文献求助10
17秒前
18秒前
20秒前
24秒前
背后的山柏完成签到,获得积分10
28秒前
28秒前
真实的语堂完成签到,获得积分10
31秒前
蔡文姬发布了新的文献求助10
33秒前
xxfsx应助张鑫采纳,获得20
34秒前
35秒前
wanci应助zyz采纳,获得20
35秒前
桐桐应助蔡文姬采纳,获得10
37秒前
42秒前
张鑫完成签到,获得积分10
43秒前
xliiii完成签到,获得积分10
43秒前
47秒前
无极微光应助科研通管家采纳,获得20
52秒前
zyz发布了新的文献求助20
52秒前
赘婿应助科研通管家采纳,获得10
52秒前
乐乐应助科研通管家采纳,获得10
52秒前
木有完成签到 ,获得积分10
52秒前
柯语雪完成签到,获得积分10
52秒前
55秒前
九灶发布了新的文献求助10
58秒前
59秒前
CC完成签到 ,获得积分10
1分钟前
gxmu6322完成签到,获得积分10
1分钟前
今后应助Janice227采纳,获得10
1分钟前
1分钟前
淡定自中发布了新的文献求助10
1分钟前
1分钟前
ceeray23发布了新的文献求助20
1分钟前
1分钟前
1分钟前
酷波er应助Li采纳,获得10
1分钟前
星辰大海应助会飞的蜗牛采纳,获得10
1分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
List of 1,091 Public Pension Profiles by Region 1561
Binary Alloy Phase Diagrams, 2nd Edition 1400
Specialist Periodical Reports - Organometallic Chemistry Organometallic Chemistry: Volume 46 1000
Schlieren and Shadowgraph Techniques:Visualizing Phenomena in Transparent Media 600
Holistic Discourse Analysis 600
Beyond the sentence: discourse and sentential form / edited by Jessica R. Wirth 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5515678
求助须知:如何正确求助?哪些是违规求助? 4609007
关于积分的说明 14514286
捐赠科研通 4545490
什么是DOI,文献DOI怎么找? 2490584
邀请新用户注册赠送积分活动 1472514
关于科研通互助平台的介绍 1444216