化学
水溶液
人工智能
机器学习
计算机科学
物理化学
作者
Qi Yang,Yao Li,Jin‐Dong Yang,Yidi Liu,Long Zhang,Sanzhong Luo,Jin‐Pei Cheng
标识
DOI:10.26434/chemrxiv.12421082
摘要
The acid dissociation constant p<i>K</i><sub>a</sub> dictates a molecule’s ionic status, and is a critical physicochemical property in rationalizing acid-base chemistry in solution and in many biological contexts. Although numerous theoretic approaches have been developed for predicating aqueous p<i>K</i><sub>a</sub>, fast and accurate prediction of non-aqueous p<i>K</i><sub>a</sub>s has remained a major challenge. On the basis of <i>i</i>BonD experimental p<i>K</i><sub>a</sub> database curated across 39 solvents, a holistic p<i>K</i><sub>a</sub> prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance <a>with MAE value as low as 0.87</a> p<i>K</i><sub>a</sub> unit. The holistic model showed better performance than all the tested single-solvent models (SSMs), verifying the transfer learning features. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible p<i>K</i><sub>a</sub> correlations between different solvents. The <i>i</i>BonD holistic model was validated by prediction of aqueous p<i>K</i><sub>a</sub> and micro-p<i>K</i><sub>a</sub> of pharmaceutical molecules and p<i>K</i><sub>a</sub>s of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform (<a href="http://pka.luoszgroup.com/">http://pka.luoszgroup.com</a>) was constructed based on the current model.
科研通智能强力驱动
Strongly Powered by AbleSci AI