计算机科学
概化理论
人工智能
编码(社会科学)
机器学习
编码
药物靶点
自然语言处理
特征(语言学)
非编码RNA
语义相似性
模式识别(心理学)
长非编码RNA
仿形(计算机编程)
特征向量
相关性(法律)
相似性(几何)
适配器(计算)
数据类型
数据挖掘
语言模型
联想(心理学)
计算生物学
作者
Zhen Zhang,Yuchen Zhang
标识
DOI:10.1021/acs.jcim.5c03011
摘要
Noncoding RNAs (ncRNAs) play critical regulatory roles in cancer drug response. However, most existing methods are limited to predicting a single type of ncRNA, failing to fully capture the complex semantic associations between multimodal biological features, and thus exhibit weak generalizability and robustness. To overcome these limitations, this study proposes NCRDLLM, a unified framework that leverages large language models (LLMs) to predict associations between three types of ncRNA (circular RNA, microRNA, and long noncoding RNA) and drugs. The method integrates 19,020 experimentally validated associations and 120,009 disease association records. Three types of multimodal features are constructed: sequence features extracted using pretrained foundation models RNA-FM and ChemBERTa, structural features generated through Graph2Vec for RNA secondary structures and AttentiveFP combined with ECFP for drug molecules, and association features obtained via disease-associated coding and semantic similarity. These features are subsequently mapped into the hidden space of LLaMA-3.2-3B through adapter modules, with LoRA employed for parameter-efficient fine-tuning. Experimental results demonstrate that NCRDLLM achieves AUC-ROC values of 0.9665, 0.9832, and 0.9676 on miRNA-drug, lncRNA-drug, and circRNA-drug data sets, respectively. Ablation studies confirm the contribution of each module, while literature evidence and tissue-specific expression profiling further support the biological relevance of the predictions. NCRDLLM provides an effective strategy for identifying potential ncRNA-drug response associations.
科研通智能强力驱动
Strongly Powered by AbleSci AI