Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data
The availability of raw data is a considerable challenge across most branches of science. In the absence of data, neither experiments can be conducted nor development can be undertaken. Despite their importance, raw data are still lacking across many scientific fields. A literature survey conducted...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-02-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340924011636 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832576465596579840 |
---|---|
author | Ghadeer Qasim Ali Husam Ali Abdulmohsin |
author_facet | Ghadeer Qasim Ali Husam Ali Abdulmohsin |
author_sort | Ghadeer Qasim Ali |
collection | DOAJ |
description | The availability of raw data is a considerable challenge across most branches of science. In the absence of data, neither experiments can be conducted nor development can be undertaken. Despite their importance, raw data are still lacking across many scientific fields. A literature survey conducted at the beginning of our study indicated a significant lack of Arabic speech datasets. Therefore, this study aims to address this problem by proposing a new Arabic and English dataset called Ghadeer-Speech-Crowd-Corpus. This dataset was designed to target more than one branch of speech-processing applications, such as crowd speaker identification, speech synthesis (text-to-speech), and speech recognition (speech-to-text). Speech samples were recorded over three months from 210 Iraqi Arab citizens living in different parts of Iraq and included more than one accent. The proposed dataset was fully balanced with respect to sex and recordings (same number of Arabic and English recordings). Additionally, it is a mono dataset and contains 15,626 audio samples recorded at a sampling rate of 44,100 Hz, 16-bit depth, and bit rate of 705.6 kb/s. The recordings were conducted at the Academy for Media Training of the College of Media, University of Baghdad. |
format | Article |
id | doaj-art-41263c4f93e34804acdc1726848b28e7 |
institution | Kabale University |
issn | 2352-3409 |
language | English |
publishDate | 2025-02-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj-art-41263c4f93e34804acdc1726848b28e72025-01-31T05:11:29ZengElsevierData in Brief2352-34092025-02-0158111201Ghadeer-speech-crowd-corpus: Speech datasetMendeley DataGhadeer Qasim Ali0Husam Ali Abdulmohsin1Computer Science Department, College of Science, University of Baghdad, IraqCorresponding author.; Computer Science Department, College of Science, University of Baghdad, IraqThe availability of raw data is a considerable challenge across most branches of science. In the absence of data, neither experiments can be conducted nor development can be undertaken. Despite their importance, raw data are still lacking across many scientific fields. A literature survey conducted at the beginning of our study indicated a significant lack of Arabic speech datasets. Therefore, this study aims to address this problem by proposing a new Arabic and English dataset called Ghadeer-Speech-Crowd-Corpus. This dataset was designed to target more than one branch of speech-processing applications, such as crowd speaker identification, speech synthesis (text-to-speech), and speech recognition (speech-to-text). Speech samples were recorded over three months from 210 Iraqi Arab citizens living in different parts of Iraq and included more than one accent. The proposed dataset was fully balanced with respect to sex and recordings (same number of Arabic and English recordings). Additionally, it is a mono dataset and contains 15,626 audio samples recorded at a sampling rate of 44,100 Hz, 16-bit depth, and bit rate of 705.6 kb/s. The recordings were conducted at the Academy for Media Training of the College of Media, University of Baghdad.http://www.sciencedirect.com/science/article/pii/S2352340924011636Arabic phraseEnglish phraseSpeech recognitionLow-resource languages |
spellingShingle | Ghadeer Qasim Ali Husam Ali Abdulmohsin Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data Data in Brief Arabic phrase English phrase Speech recognition Low-resource languages |
title | Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data |
title_full | Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data |
title_fullStr | Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data |
title_full_unstemmed | Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data |
title_short | Ghadeer-speech-crowd-corpus: Speech datasetMendeley Data |
title_sort | ghadeer speech crowd corpus speech datasetmendeley data |
topic | Arabic phrase English phrase Speech recognition Low-resource languages |
url | http://www.sciencedirect.com/science/article/pii/S2352340924011636 |
work_keys_str_mv | AT ghadeerqasimali ghadeerspeechcrowdcorpusspeechdatasetmendeleydata AT husamaliabdulmohsin ghadeerspeechcrowdcorpusspeechdatasetmendeleydata |