计算机科学
特征工程
人工智能
命名实体识别
自然语言处理
乌尔都语
特征(语言学)
文字嵌入
卷积神经网络
深度学习
词汇
词(群论)
嵌入
语言学
任务(项目管理)
管理
经济
哲学
作者
Rafiul Haq,Xiaowang Zhang,Wahab Khan,Zhiyong Feng
标识
DOI:10.1093/comjnl/bxac047
摘要
Abstract Named entity recognition (NER) is a fundamental part of other natural language processing tasks such as information retrieval, question answering systems and machine translation. Progress and success have already been achieved in research on the English NER systems. However, the Urdu NER system is still in its infancy due to the complexity and morphological richness of the Urdu language. Existing Urdu NER systems are highly dependent on manual feature engineering and word embedding to capture similarity. Their performance lags if the words are previously unknown or infrequent. The feature-based models suffer from complicated feature engineering and are often highly reliant on external resources. To overcome these limitations in this study, we present several deep neural approaches that automatically learn features from the data and eliminate manual feature engineering. Our extension involved convolutional neural network to extract character-level features and combine them with word embedding to handle out-of-vocabulary words. The study also presents a tweets dataset in Urdu, annotated manually for five named entity classes. The effectiveness of the deep learning approaches is demonstrated on four benchmarks datasets. The proposed method demonstrates notable progress upon current state-of-the-art NER approaches in Urdu. The results show an improvement of 6.26% in the F1 score.
科研通智能强力驱动
Strongly Powered by AbleSci AI