概率逻辑
晶体结构预测
分子
计算机科学
统计物理学
热力学
化学
生物系统
人工智能
物理
有机化学
生物
作者
Silabrata Pahari,Chi H. Lee,Niranjan Sitapure,Joseph Sang‐Il Kwon
摘要
Abstract Crystallization is pivotal in the chemical and pharmaceutical industry, affecting particle stability, and drug release. Crystal size distribution (CSD), a critical attribute of the final dosage form, is determined by the molecular structure of the crystallizing entity. Due to molecular diversity, establishing a clear relationship between molecular structure and CSD is challenging. This study unveils CrystalFormer, a novel framework that bridges this gap. By utilizing machine‐learned molecular fingerprints derived from an encoder‐based transformer trained on a dataset of 1.8 billion molecules, CrystalFormer introduces a “universal chemical language” to represent molecules in a latent space. These fingerprints enable the prediction of thermodynamic and kinetic properties using neural networks and probabilistic regression models. The integration of these predictions with first‐principles models like population balance equations allows for the determination of CSD with confidence bounds. The results highlight good prediction accuracy of thermodynamic and kinetic parameters with errors less than 8% for paracetamol and salicylic acid.
科研通智能强力驱动
Strongly Powered by AbleSci AI