计算机科学
语言模型
自然语言处理
变压器
人工智能
领域(数学)
自然语言
语言学
哲学
物理
数学
量子力学
电压
纯数学
作者
Nicolas Webersinke,Mathias Kraus,Julia Anna Bingler,Markus Leippold
出处
期刊:Cornell University - arXiv
日期:2021-10-22
被引量:1
标识
DOI:10.48550/arxiv.2110.12010
摘要
Over the recent years, large pretrained language models (LM) have revolutionized the field of natural language processing (NLP). However, while pretraining on general language has been shown to work very well for common language, it has been observed that niche language poses problems. In particular, climate-related texts include specific language that common LMs can not represent accurately. We argue that this shortcoming of today's LMs limits the applicability of modern NLP to the broad field of text processing of climate-related texts. As a remedy, we propose CLIMATEBERT, a transformer-based language model that is further pretrained on over 2 million paragraphs of climate-related texts, crawled from various sources such as common news, research articles, and climate reporting of companies. We find that CLIMATEBERT leads to a 48% improvement on a masked language model objective which, in turn, leads to lowering error rates by 3.57% to 35.71% for various climate-related downstream tasks like text classification, sentiment analysis, and fact-checking.
科研通智能强力驱动
Strongly Powered by AbleSci AI