语法
计算机科学
人工智能
语义学(计算机科学)
深度学习
自然语言处理
语言学
哲学
程序设计语言
作者
Yuting Wei,Yangfu Zhu,Ting Bai,Bin Wu
标识
DOI:10.1016/j.neunet.2024.106559
摘要
Ancient Chinese is a crucial bridge for understanding Chinese history and culture. Most existing works utilize high-resource modern Chinese to understand low-resource ancient Chinese, but they fail to fully consider the semantic and syntactic gaps between them due to their changes over time, resulting in the misunderstanding of ancient Chinese. Hence, we propose a novel language pre-training framework for ancient Chinese understanding based on the Cross-temporal Contrastive Disentanglement Model (CCDM), which bridges the gap between modern and ancient Chinese with their parallel corpus. Specifically, we first explore a cross-temporal data augmentation method by disentangling and reconstructing the parallel ancient-modern corpus. It is noteworthy that the proposed decoupling strategy takes full account of the cross-temporal character between ancient and modern Chinese. Then, cross-temporal contrastive learning is exploited to train the model by fully leveraging the cross-temporal information. Finally, the trained language model is utilized for downstream tasks. We conduct extensive experiments on six ancient Chinese understanding tasks. Results demonstrate that our model outperforms the state-of-the-art baselines. Our framework also holds potential applicability to other languages that have undergone evolutionary changes, leading to shifts in syntax and semantics.
科研通智能强力驱动
Strongly Powered by AbleSci AI