桥接(联网)
计算机科学
遗忘
微调
任务(项目管理)
人工智能
图层(电子)
培训(气象学)
任务分析
桥(图论)
认知心理学
工程类
物理
内科学
气象学
医学
量子力学
计算机网络
有机化学
化学
系统工程
心理学
作者
Tianxiong Xiao,Dong Yuan,Bin Dong
标识
DOI:10.1109/icnlp52887.2021.00030
摘要
Pre-trained Language Models (PLMs) have been attracting a lot attention in natural language processing field. A training paradigm of pre-training then fine-tuning is widely adopted for BERT-based architectures. However, due to the task gap between pre-training and fine-tuning, PLMs may suffer from knowledge forgetting during fine-tuning and thus lead to a worse performance than expected. We propose a new fine-tuning method in order to bridge this gap and improve the performance of PLMs. We firstly fine-tune the task-specific output layer of the PLMs while keeping the other layers` parameters constant, and then we fine-tine all layers of the PLMs as usual. Our approach is evaluated on multiple natural language understanding tasks and results to a significant improvement over a strong ELECTRA baseline. Specifically, it gains consistent improvements on tasks in GLUE and SQuAD2.0 with only a little additional computation.
科研通智能强力驱动
Strongly Powered by AbleSci AI