作者
Kavindri Ranasinghe,Adam L. Baskerville,Geoffrey P. F. Wood,Gerhard König
摘要
Neural network potentials trained on quantum-mechanical data can calculate molecular interactions with relatively high speed and accuracy. However, not all neural network potentials are suitable for molecular simulations, as they might exhibit instabilities, nonphysical behavior, or lack accuracy. To assess the reliability of neural network potentials, a series of tests is conducted during model training, in the gas phase, and in the condensed phase. The testing procedure is performed for eight in-house neural network potentials based on the ANI-2x data set, using both the ANI-2x and MACE architectures. This consistent framework allows an evaluation of the effect of the model architecture on its performance. For comparison, we also perform stability tests of the publicly available neural network potentials: ANI-2x, ANI-1ccx, MACE-OFF23, and AIMNet2. The results show that the different models have different weaknesses. A normal-mode analysis of 14 simple benchmark molecules with large displacements from the energy minima revealed that the published MACE-OFF23-S model shows large deviations from the reference quantum-mechanical energy surface. Also, some MACE models with a reduced number of parameters failed to produce stable molecular dynamics simulations in the gas phase, and all MACE models exhibit unfavorable behavior during steric clashes. In addition, the published ANI-2x and one of the in-house MACE models are not able to reproduce the structure of liquid water at ambient conditions, forming an amorphous solid phase instead. For the ANI-1ccx model, the multibody interactions in the condensed water phase lead to nonphysical additional energy minima in bond length and bond angle space, which caused a phase transition to an amorphous solid. Out of all 13 considered public and in-house models, only one in-house model based on the ANI-2x B97-3c data set shows better agreement with the experimental radial distribution function of water than the simple molecular mechanics TIP3P and OPC models. Protein-ligand interaction energies for the four benchmark systems TYK2, CDK2, JNK1, and P38 show that almost all models exhibit a higher correlation with experimental binding affinities than the Chemgauss4 docking score (average R2 > 0.16). With an average R2 of 0.43, the ANI-2x model outperforms molecular mechanics calculations with the GAFF2 force field and DFTB3 semiempirical calculations (average R2 of 0.39 and 0.38), approaching the accuracy of absolute binding free energy calculations (average R2 of 0.52). However, the rather mixed results for the different machine learning potentials show that great care must be taken during model training and when selecting a neural network potential for real-world applications.