计算机科学
代码转换
编码(集合论)
词(群论)
光学(聚焦)
零(语言学)
自然语言
语言模型
情绪分析
自然语言处理
程序设计语言
人工智能
语言学
集合(抽象数据类型)
哲学
物理
光学
作者
Zhi Li,Xing Gao,Ji Zhang,Yin Zhang⋆
标识
DOI:10.1145/3477495.3531914
摘要
In multilingual communities, code-switching is a common phenomenon and code-switched tasks have become a crucial area of research in natural language processing (NLP) applications. Existing approaches mainly focus on supervised learning. However, it is expensive to annotate a sufficient amount of code-switched data. In this paper, we consider zero-shot setting and improve model performance on code-switched tasks via monolingual language datasets, unlabeled code-switched datasets, and semantic dictionaries. Inspired by the mechanism of code-switching itself, we propose multi-label masked language modeling and predict both the masked word and its synonyms in other languages. Experimental results show that compared with baselines, our method can further improve the pretrained multilingual model's performance on code-switched sentiment analysis datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI