管道(软件)
财产(哲学)
计算机科学
人工智能
机器学习
哲学
程序设计语言
认识论
作者
Aravindh N. Marimuthu,Brett A. McGuire
标识
DOI:10.1021/acs.jcim.5c00516
摘要
We present ChemXploreML, a modular desktop application designed for machine learning-based molecular property prediction. The framework's flexible architecture allows integration of any molecular embedding technique with modern machine learning algorithms, enabling researchers to customize their prediction pipelines without extensive programming expertise. To demonstrate the framework's capabilities, we implement and evaluate two molecular embedding approaches─Mol2Vec and VICGAE (Variance-Invariance-Covariance regularized GRU Auto-Encoder)─combined with state-of-the-art tree-based ensemble methods (Gradient Boosting Regression, XGBoost, CatBoost, and LightGBM). Using five fundamental molecular properties as test cases─melting point, boiling point, vapor pressure, critical temperature (CT), and critical pressure─we validate our framework on a data set from the CRC Handbook of Chemistry and Physics. The models achieve excellent performance for well-distributed properties, with R2 values up to 0.93 for CT predictions. Notably, while Mol2Vec embeddings (300 dimensions) delivered slightly higher accuracy, VICGAE embeddings (32 dimensions) exhibited comparable performance yet offered significantly improved computational efficiency. ChemXploreML's modular design facilitates easy integration of new embedding techniques and machine learning algorithms, providing a flexible platform for customized property prediction tasks. The application automates chemical data preprocessing (including UMAP-based exploration of molecular space), model optimization, and performance analysis through an intuitive interface, making sophisticated machine learning techniques accessible while maintaining extensibility for advanced cheminformatics users.
科研通智能强力驱动
Strongly Powered by AbleSci AI