Advancing Urdu named entity recognition: deep learning for aspect targeting

Aziz, Kamran and Ahmed, Naveed and Yu, Yaoxiang and Hadi, Hassan Jalil and Alshara, Mohammaed Ali and Tariq, Umair and Ji, Donghong (2025) Advancing Urdu named entity recognition: deep learning for aspect targeting. Complex & Intelligent Systems, 11 (12). ISSN 2199-4536

[thumbnail of s40747-025-02066-6.pdf]
Preview
Text
s40747-025-02066-6.pdf - Published Version
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

This study unveils the Named Entity Recognition (NER) system specifically designed for Urdu news headlines, aimed at bridging crucial linguistic resource gaps. We meticulously developed a comprehensive corpus from diverse news sources, specifically tailored to reflect Urdu’s unique orthographic and morphological characteristics. Our approach incorporates state-of-the-art (SOTA) neural technologies including transformers for deep contextual embeddings, Graph Convolutional Networks (GCN) for detailed syntactic analysis, and Biaffine Attention mechanisms to enhance inter-token relationships. A Conditional Random Field (CRF) layer further ensures accurate and consistent entity labeling, improving the system’s precision. Initially, our model was rigorously benchmarked using established transformer models such as XLM-R, mBERT, and XLNet to set initial performance benchmarks. Subsequent enhancements involved integrating encoder functionalities from generative models like mBART and mT5, allowing a thorough comparative evaluation of these advanced encoders against our benchmarks. This phase aimed to assess their potential in effectively detecting implicit entities, thus enhancing our model’s functionality for complex searches and automated content categorization on Urdu digital platforms. Our improvements notably contribute to computational linguistics by extending SOTA language technologies to under-resourced languages and promoting greater inclusivity in Natural Language Processing (NLP).

Item Type: Article
Identification Number: 10.1007/s40747-025-02066-6
Dates:
Date
Event
18 August 2025
Accepted
29 October 2025
Published Online
Uncontrolled Keywords: Named Entity Recognition, Data mining, NLP, Entity extraction, XLM-R, Deep learning
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Architecture, Built Environment, Computing and Engineering > Computer Science
Depositing User: Gemma Tonks
Date Deposited: 18 Nov 2025 13:49
Last Modified: 18 Nov 2025 13:49
URI: https://www.open-access.bcu.ac.uk/id/eprint/16725

Actions (login required)

View Item View Item

Research

In this section...