Building a neural speech recognizer for quranic recitations

Al-Issa, Suhad and Elmitwally, Nouh and Al-Ayyoub, Mahmoud and Al-Khaleel, Osama (2022) Building a neural speech recognizer for quranic recitations. International Journal of Speech Technology. ISSN 1572-8110

[img]
Preview
Text
Building a neural speech recognizer for quranic recitations.pdf - Published Version

Download (1MB)

Abstract

This work is an effort towards building Neural Speech Recognizers system for Quranic recitations that can be effectively used by anyone regardless of their gender and age. Despite having a lot of recitations available online, most of them are recorded by professional male adult reciters, which means that an ASR system trained on such datasets would not work for female/child reciters. We address this gap by adopting a benchmark dataset of audio records of Quranic recitations that consists of recitations by both genders from different ages. Using this dataset, we build several speaker-independent NSR systems based on the DeepSpeech model and use word error rate (WER) for evaluating them. The goal is to show how an NSR system trained and tuned on a dataset of a certain gender would perform on a test set from the other gender. Unfortunately, the number of female recitations in our dataset is rather small while the number of male recitations is much larger. In the first set of experiments, we avoid the imbalance issue between the two genders and down-sample the male part to match the female part. For this small subset of our dataset, the results are interesting with 0.968 WER when the system is trained on male recitations and tested on female recitations. The same system gives 0.406 WER when tested on male recitations. On the other hand, training the system on female recitations and testing it on male recitation gives 0.966 WER while testing it on female recitations gives 0.608 WER.

Item Type: Article
Identification Number: https://doi.org/10.1007/s10772-022-09988-3
Dates:
DateEvent
22 June 2022Accepted
5 August 2022Published Online
Uncontrolled Keywords: Quran, Speech, ASR, DeepSpeech, WER, Dataset
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Nouh Elmitwally
Date Deposited: 31 Aug 2022 10:55
Last Modified: 31 Aug 2022 10:58
URI: https://www.open-access.bcu.ac.uk/id/eprint/13502

Actions (login required)

View Item View Item

Research

In this section...