计算机科学
判决
分类器(UML)
自然语言处理
人工智能
特征(语言学)
语义特征
情报检索
模式识别(心理学)
语言学
哲学
作者
Qingyan Wang,Ye Wang,Dajiang Lei
出处
期刊:Communications in computer and information science
日期:2023-01-01
卷期号:: 597-611
被引量:1
标识
DOI:10.1007/978-981-99-5847-4_43
摘要
Many type 2 diabetes patients and high-risk groups has an increasing demand for specialized information on diabetes. However, the long-tail problem often generate difficulties in model training and reduced classification accuracy. In this paper, we propose enhancing senmantic feature approach to solve the long-tail problem in Chinese diabetes text classification and detailed practice is as followes: we enrich the tail classes knowledge by enhancing semantic features module and then use the attention aggregation module to improve the semantic representation by fusing these semantic features. As for the enhancing semantic feature module, we proposed two strategies: using different dropouts while pre-trained language model is same and using different pre-trained language model. As for the attention aggregation module, its purpose is to better fusing the semantic features obtained previously. After processing by these two modules, we send the final feature vector into the classifier. The final accuracy of 89.1% was obtained for the classification of Chinese diabetes in the NCAA2023 assessment.
科研通智能强力驱动
Strongly Powered by AbleSci AI