Nonspeech7k dataset: Classification and analysis of human non‐speech sound
Abstract Human non‐speech sounds occur during expressions in a real‐life environment. Realising a person's incapability to prompt confident expressions by non‐speech sounds may assist in identifying premature disorder in medical applications. A novel dataset named Nonspeech7k is introduced that...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2023-06-01
|
| Series: | IET Signal Processing |
| Subjects: | |
| Online Access: | https://doi.org/10.1049/sil2.12233 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849404499152601088 |
|---|---|
| author | Muhammad Mamunur Rashid Guiqing Li Chengrui Du |
| author_facet | Muhammad Mamunur Rashid Guiqing Li Chengrui Du |
| author_sort | Muhammad Mamunur Rashid |
| collection | DOAJ |
| description | Abstract Human non‐speech sounds occur during expressions in a real‐life environment. Realising a person's incapability to prompt confident expressions by non‐speech sounds may assist in identifying premature disorder in medical applications. A novel dataset named Nonspeech7k is introduced that contains a diverse set of human non‐speech sounds, such as the sounds of breathing, coughing, crying, laughing, screaming, sneezing, and yawning. The authors then conduct a variety of classification experiments with end‐to‐end deep convolutional neural networks (CNN) to show the performance of the dataset. First, a set of typical deep classifiers are used to verify the reliability and validity of Nonspeech7k. Involved CNN models include 1D‐2D deep CNN EnvNet, deep stack CNN M11, deep stack CNN M18, intense residual block CNN ResNet34, modified M11 named M12, and the authors’ baseline model. Among these, M12 achieves the highest accuracy of 79%. Second, to verify the heterogeneity of Nonspeech7k with respect to two typical datasets, FSD50K and VocalSound, the authors design a series of experiments to analyse the classification performance of deep neural network classifier M12 by using FSD50K, FSD50K + Nonspeech7k, VocalSound, VocalSound + Nonspeech7k as training data, respectively. Experimental results show that the classifier trained with existing datasets mixed with Nonspeech7k achieves the highest accuracy improvement of 15.7% compared to that without Nonspeech7k mixed. Nonspeech7k is 100% annotated, completely checked, and free of noise. It is available at https://doi.org/10.5281/zenodo.6967442. |
| format | Article |
| id | doaj-art-906206f71b99408cb214534ca09b6bd8 |
| institution | Kabale University |
| issn | 1751-9675 1751-9683 |
| language | English |
| publishDate | 2023-06-01 |
| publisher | Wiley |
| record_format | Article |
| series | IET Signal Processing |
| spelling | doaj-art-906206f71b99408cb214534ca09b6bd82025-08-20T03:36:58ZengWileyIET Signal Processing1751-96751751-96832023-06-01176n/an/a10.1049/sil2.12233Nonspeech7k dataset: Classification and analysis of human non‐speech soundMuhammad Mamunur Rashid0Guiqing Li1Chengrui Du2School of Computer Science and Engineering South China University of Technology Guangzhou ChinaSchool of Computer Science and Engineering South China University of Technology Guangzhou ChinaSchool of Foreign Languages South China University of Technology Guangzhou ChinaAbstract Human non‐speech sounds occur during expressions in a real‐life environment. Realising a person's incapability to prompt confident expressions by non‐speech sounds may assist in identifying premature disorder in medical applications. A novel dataset named Nonspeech7k is introduced that contains a diverse set of human non‐speech sounds, such as the sounds of breathing, coughing, crying, laughing, screaming, sneezing, and yawning. The authors then conduct a variety of classification experiments with end‐to‐end deep convolutional neural networks (CNN) to show the performance of the dataset. First, a set of typical deep classifiers are used to verify the reliability and validity of Nonspeech7k. Involved CNN models include 1D‐2D deep CNN EnvNet, deep stack CNN M11, deep stack CNN M18, intense residual block CNN ResNet34, modified M11 named M12, and the authors’ baseline model. Among these, M12 achieves the highest accuracy of 79%. Second, to verify the heterogeneity of Nonspeech7k with respect to two typical datasets, FSD50K and VocalSound, the authors design a series of experiments to analyse the classification performance of deep neural network classifier M12 by using FSD50K, FSD50K + Nonspeech7k, VocalSound, VocalSound + Nonspeech7k as training data, respectively. Experimental results show that the classifier trained with existing datasets mixed with Nonspeech7k achieves the highest accuracy improvement of 15.7% compared to that without Nonspeech7k mixed. Nonspeech7k is 100% annotated, completely checked, and free of noise. It is available at https://doi.org/10.5281/zenodo.6967442.https://doi.org/10.1049/sil2.12233audio datasetaudio expression audio signal processinghuman non-speech soundsignal classificationspeech intelligibility |
| spellingShingle | Muhammad Mamunur Rashid Guiqing Li Chengrui Du Nonspeech7k dataset: Classification and analysis of human non‐speech sound IET Signal Processing audio dataset audio expression audio signal processing human non-speech sound signal classification speech intelligibility |
| title | Nonspeech7k dataset: Classification and analysis of human non‐speech sound |
| title_full | Nonspeech7k dataset: Classification and analysis of human non‐speech sound |
| title_fullStr | Nonspeech7k dataset: Classification and analysis of human non‐speech sound |
| title_full_unstemmed | Nonspeech7k dataset: Classification and analysis of human non‐speech sound |
| title_short | Nonspeech7k dataset: Classification and analysis of human non‐speech sound |
| title_sort | nonspeech7k dataset classification and analysis of human non speech sound |
| topic | audio dataset audio expression audio signal processing human non-speech sound signal classification speech intelligibility |
| url | https://doi.org/10.1049/sil2.12233 |
| work_keys_str_mv | AT muhammadmamunurrashid nonspeech7kdatasetclassificationandanalysisofhumannonspeechsound AT guiqingli nonspeech7kdatasetclassificationandanalysisofhumannonspeechsound AT chengruidu nonspeech7kdatasetclassificationandanalysisofhumannonspeechsound |