计算机科学
任务(项目管理)
自然语言处理
语言模型
人工智能
训练集
构造(python库)
培训(气象学)
集合(抽象数据类型)
自然语言
领域(数学)
注释
自然语言理解
地理
工程类
程序设计语言
纯数学
系统工程
气象学
数学
作者
Jiangyan Zhang,Deji Kazhuo,Gadeng Luosang,Nyima Trashi,Nuo Qun
标识
DOI:10.1145/3548608.3559255
摘要
In recent years, pre-training language models have been widely used in the field of natural language processing, but the research on Tibetan pre-training language models is still in the exploratory stage. To promote the further development of Tibetan natural language processing and effectively solve the problem of the scarcity of Tibetan annotation data sets, the article studies the Tibetan pre-training language model based on BERT. First, given the characteristics of the Tibetan language, we constructed a data set for the BERT pre-training language model and downstream text classification tasks. Secondly, construct a small-scale Tibetan BERT pre-training language model to train it. Finally, the performance of the model was verified through the downstream task of Tibetan text classification, and an accuracy rate of 86% was achieved on the task of text classification. Experiments show that the model we built has a significant effect on the task of Tibetan text classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI