注释
关系(数据库)
语言学
自然语言处理
人工智能
计算机科学
历史
哲学
数据库
作者
Xuemei Tang,Zekun Deng,Jun Wang,Qi Su
摘要
Abstract This work contributes to the digital humanities approach for studying premodern Chinese history and culture by creating a large-scale dataset annotated with named entities and relations. Through careful annotation guidelines and labeling of over 200,000 characters, we developed a dataset containing 30,000 named entities across six types and 7,000 relations spanning twenty categories. Experiments on named entity recognition (NER) using pre-trained language models and large language models on this dataset achieved an initial performance of NER (91.32 percent F1). In addition, relationship extraction (RE) on the pretrained language model achieves an 85.32 percent F1 score. While there is still room for improvement, our annotated dataset and models provide a useful starting point for extracting semantic information from premodern Chinese texts. It represents an effort to connect history and technology, increasing accessibility and preservation of premodern Chinese cultural treasures. Furthermore, our dataset can facilitate downstream tasks like culture analysis, knowledge graph construction, and computational understanding of premodern Chinese. Overall, this research represents a significant step toward digitally exploring premodern Chinese documents, providing a pathway for future work on knowledge organization and computational analysis of this valuable cultural legacy. Our code and data are available at: https://github.com/tangxuemei1995/AnChineseNERE
科研通智能强力驱动
Strongly Powered by AbleSci AI