基础(证据)
培训(气象学)
计算机科学
地理
考古
气象学
作者
Riya Singh,Aryan Amit Barsainyan,Rida Irfan,Connor Joseph Amorin,Stewart He,Tony Davis,Arun Thiagarajan,Shiva Sankaran,Seyone Chithrananda,Walid Ahmad,Derek Jones,Kevin McLoughlin,Hyojin Kim,Anoushka Bhutani,Sheela Sathyanarayana,Venkat Viswanathan,Jonathan Allen,Bharath Ramsundar
标识
DOI:10.26434/chemrxiv-2025-4glrl
摘要
The rapid advancement of machine learning in computational chemistry has opened new doors for designing molecules, predicting molecular properties, and discovering novel materials. However, building scalable and robust models for molecular property prediction remains a significant challenge due to the vast size and complexity of chemical space. In this paper, we introduce ChemBERTa-3, an open-source training framework designed to train and fine-tune large-scale chemical foundation models. We explore the potential of multiple model architectures by evaluating their performance across various molecular datasets from the MoleculeNet suite. Our experiments demonstrated that pre-training on the expansive ZINC20 dataset yields models capable of performing well on both classification and regression tasks, providing valuable insights into drug discovery and materials science. For scalability, we leveraged both AWS-based Ray deployments and on-premise high-performance computing clusters to support the processing power required to train on billions of molecules. In support of reproducible and extensible science, we have open-sourced all ChemBERTa3 models.
科研通智能强力驱动
Strongly Powered by AbleSci AI