计算机科学
自然语言处理
背景(考古学)
人工智能
纳克
多语种
互联网
情报检索
万维网
语言模型
语言学
地理
哲学
考古
作者
Prasenjit Majumder,Mandar Mitra
摘要
With the increasingly widespread use of computers & the Internet in India, large amounts of information in Indian languages are becoming available on the web. Automatic information processing and retrieval is therefore becoming an urgent need in the Indian context. Moreover, since India is a multilingual country, any effective approach to IR in the Indian context needs to be capable of handling a multilingual collection of documents. In this paper, we discuss the N-gram approach to developing some basic tools in the area of IR and NLP. This approach is statistical and language independent in nature, and therefore eminently suited to the multilingual Indian context. We first present a brief survey of some language-processing applications in which N-grams have been successfully used. We also present the results of some preliminary experiments on using N-grams for identifying the language of an Indian language document, based on a method proposed by Cavnar et al [1]. 1.
科研通智能强力驱动
Strongly Powered by AbleSci AI