心理语言学
比例(比率)
语言学
计算机科学
认知科学
自然语言处理
心理学
哲学
地理
认知
神经科学
地图学
作者
Ethan Wilcox,Michael Y. Hu,Aaron Mueller,Tal Linzen,Alex Warstadt,Leshem Choshen,Chengxu Zhuang,Ryan Cotterell,Adina Williams
标识
DOI:10.31234/osf.io/rfwgd_v2
摘要
Neural network language models can learn a surprising amount about language by predicting upcoming words in a corpus. Recent language technologies work has demonstrated that large performance improvements can arise from simply increasing ("scaling") the size of the data sets they are trained on (and, correspondingly, the number of parameters in those models); accordingly, many contemporary systems are trained on trillions of words. While largely beneficial to performance on language applications, scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by scaling, as well as the benefits that would result from human-scale language modeling research. In the second half of this paper, we report on takeaways from two efforts to bring about human-scale language model pretraining. First, we report on the first iteration of the BabyLM Challenge, a shared task organized by the authors that asked participants to train a language model on 100 million words or less. Second, we present experiments to answer open questions from the findings of the BabyLM Challenge: namely, are a significant amount of computational resources required to achieve high performance, even at such small scales? We find that high performance can be achieved at small data scales and with typical academic-scale computational resources.
科研通智能强力驱动
Strongly Powered by AbleSci AI