计算机科学
代表(政治)
人工智能
阶段(地层学)
机器学习
生物
古生物学
政治
政治学
法学
作者
Keqiang Hu,Yuan He,Jianguo Wei,Changming Sun,Jie Geng,Leyi Wei,Ran Su
标识
DOI:10.1109/jbhi.2025.3556766
摘要
Accurate prediction of molecular toxicity is vital for drug development. Most mainstream methods rely on fingerprints or graph-based feature extraction, the emergence of large language models (LLMs) offers new prospects for molecular representation learning in toxicity prediction. Although several studies attempt to leverage LLMs to integrate molecular sequence data for pretraining molecular representations, certain limitations remain. Current LLM-based approaches usually utilize solely on class embedding features, overlooking the rich information in sequence embedding. Moreover, integrating pre-trained molecular representations with multi-modal molecular data may further enhance performance in toxicity prediction. To address these challenges, we propose BFGTP, a BERT-guided two-stage molecular representation learning framework for toxicity prediction. Firstly, we design independent encoders for molecular descriptions of three modalities, where the fingerprint encoder with dual level attention mechanisms effectively integrates multi-category fingerprints. Then, the two-stage guide strategy is introduced to fully utilize the prior knowledge of LLMs, employing contrastive learning to align and fuse the tri-modal representations and knowledge distillation to align predicted value distributions. BFGTP ultimately combines fingerprint and graph representations to predict molecular toxicity. Experiments on seven toxicity datasets show that BFGTP outperforms baselines, achieving the highest AUC on five datasets and the best average performance across five evaluation metrics. Ablation studies, t-SNE visualization and case study confirm the effectiveness of BFGTP's components and its ability to capture meaningful molecular representations.
科研通智能强力驱动
Strongly Powered by AbleSci AI