Towards Building a Speech Recognition System for Quranic Recitations: A Pilot Study Involving Female Reciters

Al-Issa, Suhad and Al-Ayyoub, Mahmoud and Al-Khaleel, Osama and Elmitwally, Nouh (2022) Towards Building a Speech Recognition System for Quranic Recitations: A Pilot Study Involving Female Reciters. Jordan Journal of Electrical Engineering, 8 (4). pp. 307-321. ISSN 2409-9600

[img]
Preview
Text
JJEE Vol. 8, No. 4, 2022, pp. 307-321.pdf - Published Version

Download (1MB)

Abstract

This paper is the first step in an effort toward building automatic speech recognition (ASR) system for Quranic recitations that caters specifically to female reciters. To function properly, ASR systems require a huge amount of data for training. Surprisingly, the data readily available for Quranic recitations suffer from major limitations. Specifically, the currently available audio recordings of Quran recitations have massive volume, but they are mostly done by male reciters (who have dedicated most of their lives to perfecting their recitation skills) using professional and expensive equipment. Such proficiency in the training data (along with the fact that the reciters come from a specific demographic group; adult males) will most likely lead to some bias in the resulting model and limit their ability to process input from other groups, such as non-/semi-professionals, females or children. This work aims at empirically exploring this shortcoming. To do so, we create a first-of-its-kind (to the best of our knowledge) benchmark dataset called the Quran recitations by females and males (QRFAM) dataset. QRFAM is a relatively big dataset of audio recordings made by male and female reciters from different age groups and proficiency levels. After creating the dataset, we experiment on it by building ASR systems based on one of the most popular open-source ASR models, which is the celebrated DeepSpeech model from Mozilla. The speaker-independent end-to-end models, that we produce, are evaluated using word error rate (WER). Despite DeepSpeech’s known flexibility and prowess (which is shown when trained and tested on recitations from the same group), the models trained on the recitations of one group could not recognize most of the recitations done by the other groups in the testing phase. This shows that there is still a long way to go in order to produce an ASR system that can be used by anyone and the first step is to build and expand the resources needed for this such as QRFAM. Hopefully, our work will be the first step in this direction and it will inspire the community to take more interest in this problem.

Item Type: Article
Identification Number: https://doi.org/10.5455/jjee.204-1612774767
Dates:
DateEvent
12 November 2022Accepted
12 November 2022Published Online
Uncontrolled Keywords: Holy Quran, Recitations, Speech recognition, DeepSpeech, Word error rate.
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Nouh Elmitwally
Date Deposited: 08 Dec 2022 11:32
Last Modified: 08 Dec 2022 11:32
URI: https://www.open-access.bcu.ac.uk/id/eprint/14001

Actions (login required)

View Item View Item

Research

In this section...