计算机科学
判决
自然语言处理
人工智能
嵌入
越南语
构造(python库)
光学(聚焦)
语言学
哲学
物理
光学
程序设计语言
作者
Yuxin Huang,Yin Liang,Zhimin Wu,Enchang Zhu,Zhengtao Yu
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing
日期:2023-06-16
卷期号:22 (6): 1-18
摘要
Cross-lingual sentence embedding’s goal is mapping sentences with similar semantics but in different languages close together and dissimilar sentences farther apart in the representation space. It is the basis of many downstream tasks such as cross-lingual document matching and cross-lingual summary extraction. At present, the works of cross-lingual sentence embedding tasks mainly focus on languages with large-scale corpus. But low-resource languages such as Chinese-Vietnamese are short of sentence-level parallel corpora and clear cross-lingual monitoring signals, and these works on low-resource languages have poor performances. Therefore, we propose a cross-lingual sentence embedding method based on contrastive learning and effectively fine-tune powerful pretraining mode by constructing sentence-level positive and negative samples to avoid the catastrophic forgetting problem of the traditional fine-tuning pre-trained model based only on small-scale aligned positive samples. First, we construct positive and negative examples by taking parallel Chinese Vietnamese sentences as positive examples and non-parallel sentences as negative examples. Second, we construct a siamese network to get contrastive loss by inputting positive and negative samples and fine-tuning our model. The experimental results show that our method can effectively improve the semantic alignment accuracy of cross-lingual sentence embedding in Chinese and Vietnamese contexts.
科研通智能强力驱动
Strongly Powered by AbleSci AI