安全性令牌
计算机科学
立场文件
职位(财务)
人工智能
自然语言处理
计算机网络
万维网
业务
财务
作者
Mehdi Ben Amor,Michael Granitzer,Jelena Mitrović
标识
DOI:10.1145/3605098.3636126
摘要
Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging. Those tasks are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification. We propose an evaluation approach to investigate position bias in transformer models with different position embedding techniques. We show that LMs can suffer from this bias with an average drop in performance ranging from 3% to 5%. We propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of ≈ 2% in the performance of the model on CoNLL03, UD_en, and TweeBank.
科研通智能强力驱动
Strongly Powered by AbleSci AI