Infant cry classification using an efficient graph structure and attention-based model
Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-07-01
|
| Series: | Kuwait Journal of Science |
| Subjects: | |
| Online Access: | https://www.sciencedirect.com/science/article/pii/S2307410824000464 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850209076311490560 |
|---|---|
| author | Qiao X. Jiao S. Li H. |
| author_facet | Qiao X. Jiao S. Li H. |
| author_sort | Qiao X. |
| collection | DOAJ |
| description | Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based on multi-dimensional hybrid features is proposed. Firstly, infant cries are processed to extract various speech features, such as spectrogram, mel-scaled spectrogram, MFCC, and others. These speech features are then combined across multiple dimensions to better utilize the information in the cries. Additionally, in order to better classify the efficient graph structure, a local-to-global convolutional neural network (AlgNet) based on convolutional neural networks and attention mechanisms is proposed. The experimental results demonstrate that the use of the efficient graph structure improved the accuracy by an average of 8.01% compared to using standalone speech features, and the AlgNet model achieved an average accuracy improvement of 5.62% compared to traditional deep learning models. Experiments were conducted using the Dunstan baby language, Donate a cry, and baby cry datasets with accuracy rates of 87.78%, 93.83%, and 93.14% respectively. © 2024 The Authors |
| format | Article |
| id | doaj-art-0a49fdea24b342febf6826bbb9308758 |
| institution | OA Journals |
| issn | 2307-4108 2307-4116 |
| language | English |
| publishDate | 2024-07-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Kuwait Journal of Science |
| spelling | doaj-art-0a49fdea24b342febf6826bbb93087582025-08-20T02:10:06ZengElsevierKuwait Journal of Science2307-41082307-41162024-07-0151310022110.1016/j.kjs.2024.100221Infant cry classification using an efficient graph structure and attention-based modelQiao X.Jiao S.Li H.Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based on multi-dimensional hybrid features is proposed. Firstly, infant cries are processed to extract various speech features, such as spectrogram, mel-scaled spectrogram, MFCC, and others. These speech features are then combined across multiple dimensions to better utilize the information in the cries. Additionally, in order to better classify the efficient graph structure, a local-to-global convolutional neural network (AlgNet) based on convolutional neural networks and attention mechanisms is proposed. The experimental results demonstrate that the use of the efficient graph structure improved the accuracy by an average of 8.01% compared to using standalone speech features, and the AlgNet model achieved an average accuracy improvement of 5.62% compared to traditional deep learning models. Experiments were conducted using the Dunstan baby language, Donate a cry, and baby cry datasets with accuracy rates of 87.78%, 93.83%, and 93.14% respectively. © 2024 The Authorshttps://www.sciencedirect.com/science/article/pii/S2307410824000464audio classificationinfant crymulti-head attentionneural network |
| spellingShingle | Qiao X. Jiao S. Li H. Infant cry classification using an efficient graph structure and attention-based model Kuwait Journal of Science audio classification infant cry multi-head attention neural network |
| title | Infant cry classification using an efficient graph structure and attention-based model |
| title_full | Infant cry classification using an efficient graph structure and attention-based model |
| title_fullStr | Infant cry classification using an efficient graph structure and attention-based model |
| title_full_unstemmed | Infant cry classification using an efficient graph structure and attention-based model |
| title_short | Infant cry classification using an efficient graph structure and attention-based model |
| title_sort | infant cry classification using an efficient graph structure and attention based model |
| topic | audio classification infant cry multi-head attention neural network |
| url | https://www.sciencedirect.com/science/article/pii/S2307410824000464 |
| work_keys_str_mv | AT qiaox infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel AT jiaos infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel AT lih infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel |