操作化
秩(图论)
多样性(控制论)
排名(信息检索)
按频率列出的单词列表
计算机科学
词(群论)
语言学
自然语言处理
人工智能
心理学
数学
判决
认识论
组合数学
哲学
作者
Jesse Egbert,Brent D. Burch
标识
DOI:10.1093/applin/amac030
摘要
Abstract The words in a language or language variety are often rank ordered in lists that are meant to reflect the relative importance of those words to language users and learners of a language. This rank ordering is done on the basis of the relative prevalence of words in a corpus. Lexical prevalence is often operationalized as measures of frequency, dispersion, or adjusted frequency. Yet, to date, there is no consensus on best practices for identifying and ranking prevalent words in a corpus, or for evaluating the degree to which a word’s importance is reflected through its prevalence. We begin this paper by introducing and describing a wide range of corpus-based measures for quantifying lexical prevalence. We then carry out two case studies on the Duolingo University Textbook Corpus to evaluate the methods for their ability to identify and appropriately rank words in terms of their importance. We conclude with recommendations for word list creators and researchers and practitioners interested in word importance.
科研通智能强力驱动
Strongly Powered by AbleSci AI