计算机科学
过程(计算)
人工智能
中医药
稀缺
自然语言处理
数据科学
语言模型
统一医学语言系统
梅德林
机器学习
知识管理
深度学习
医学
作者
Sibo Wei,Xueping Peng,Yifei Wang,Tao Shen,Jiasheng Si,Weiyu Zhang,Fa Zhu,Athanasios V. Vasilakos,Wenpeng Lü,Xiaoming Wu,Yinglong Wang
标识
DOI:10.1109/jbhi.2025.3612415
摘要
, a TCM-specific LLM, using a two-stage training process that first injects domainspecific knowledge and then aligns it through targeted stimulation to enhance diagnostic and differentiation capabilities. Specifically, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continual pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 31 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available on GitHub.
科研通智能强力驱动
Strongly Powered by AbleSci AI