计算机科学
人工智能
乌尔都语
自然语言处理
机器学习
水准点(测量)
深度学习
集成学习
大地测量学
语言学
哲学
地理
作者
Faiza Mehmood,Rehab Shahzadi,Hina Ghafoor,Muhammad Nabeel Asim,Muhammad Usman Ghani Khan,Waqar Mahmood,Andreas Dengel
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing
日期:2023-08-28
卷期号:22 (9): 1-31
被引量:4
摘要
Exponential growth of electronic data requires advanced multi-label classification approaches for the development of natural language processing (NLP) applications such as recommendation systems, drug reaction detection, hate speech detection, and opinion recognition/mining. To date, several machine and deep learning–based multi-label classification methodologies have been proposed for English, French, German, Chinese, Arabic, and other developed languages. Urdu is the 11th largest language in the world and has no computer-aided multi-label textual news classification approach. Unlike other languages, Urdu is lacking multi-label text classification datasets that can be used to benchmark the performance of existing machine and deep learning methodologies. With an aim to accelerate and expedite research for the development of Urdu multi-label text classification–based applications, this article provides multiple contributions as follows: First, it provides a manually annotated multi-label textual news classification dataset for the Urdu language. Second, it benchmarks the performance of traditional machine learning approaches particularly by adapting three data transformation approaches along with three top-performing machine learning classifiers and four algorithm adaptation-based approaches. Third, it benchmarks performance of 16 existing deep learning approaches and the four most widely used language models. Finally, it provides an ensemble approach that reaps the benefits of three different deep learning architectures to precisely predict different classes associated with a particular Urdu textual document. Experimental results reveal that proposed ensemble approach performance values (87% accuracy, 92% F1-score, and 8% hamming loss) are significantly higher than adapted machine and deep learning–based approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI