Infant cry classification using an efficient graph structure and attention-based model

Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiao X., Jiao S., Li H.
Format: Article
Language:English
Published: Elsevier 2024-07-01
Series:Kuwait Journal of Science
Subjects:
Online Access:https://www.sciencedirect.com/science/article/pii/S2307410824000464
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850209076311490560
author Qiao X.
Jiao S.
Li H.
author_facet Qiao X.
Jiao S.
Li H.
author_sort Qiao X.
collection DOAJ
description Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based on multi-dimensional hybrid features is proposed. Firstly, infant cries are processed to extract various speech features, such as spectrogram, mel-scaled spectrogram, MFCC, and others. These speech features are then combined across multiple dimensions to better utilize the information in the cries. Additionally, in order to better classify the efficient graph structure, a local-to-global convolutional neural network (AlgNet) based on convolutional neural networks and attention mechanisms is proposed. The experimental results demonstrate that the use of the efficient graph structure improved the accuracy by an average of 8.01% compared to using standalone speech features, and the AlgNet model achieved an average accuracy improvement of 5.62% compared to traditional deep learning models. Experiments were conducted using the Dunstan baby language, Donate a cry, and baby cry datasets with accuracy rates of 87.78%, 93.83%, and 93.14% respectively. © 2024 The Authors
format Article
id doaj-art-0a49fdea24b342febf6826bbb9308758
institution OA Journals
issn 2307-4108
2307-4116
language English
publishDate 2024-07-01
publisher Elsevier
record_format Article
series Kuwait Journal of Science
spelling doaj-art-0a49fdea24b342febf6826bbb93087582025-08-20T02:10:06ZengElsevierKuwait Journal of Science2307-41082307-41162024-07-0151310022110.1016/j.kjs.2024.100221Infant cry classification using an efficient graph structure and attention-based modelQiao X.Jiao S.Li H.Crying serves as the primary means through which infants communicate, presenting a significant challenge for new parents in understanding its underlying causes. This study aims to classify infant cries to ascertain the reasons behind their distress. In this paper, an efficient graph structure based on multi-dimensional hybrid features is proposed. Firstly, infant cries are processed to extract various speech features, such as spectrogram, mel-scaled spectrogram, MFCC, and others. These speech features are then combined across multiple dimensions to better utilize the information in the cries. Additionally, in order to better classify the efficient graph structure, a local-to-global convolutional neural network (AlgNet) based on convolutional neural networks and attention mechanisms is proposed. The experimental results demonstrate that the use of the efficient graph structure improved the accuracy by an average of 8.01% compared to using standalone speech features, and the AlgNet model achieved an average accuracy improvement of 5.62% compared to traditional deep learning models. Experiments were conducted using the Dunstan baby language, Donate a cry, and baby cry datasets with accuracy rates of 87.78%, 93.83%, and 93.14% respectively. © 2024 The Authorshttps://www.sciencedirect.com/science/article/pii/S2307410824000464audio classificationinfant crymulti-head attentionneural network
spellingShingle Qiao X.
Jiao S.
Li H.
Infant cry classification using an efficient graph structure and attention-based model
Kuwait Journal of Science
audio classification
infant cry
multi-head attention
neural network
title Infant cry classification using an efficient graph structure and attention-based model
title_full Infant cry classification using an efficient graph structure and attention-based model
title_fullStr Infant cry classification using an efficient graph structure and attention-based model
title_full_unstemmed Infant cry classification using an efficient graph structure and attention-based model
title_short Infant cry classification using an efficient graph structure and attention-based model
title_sort infant cry classification using an efficient graph structure and attention based model
topic audio classification
infant cry
multi-head attention
neural network
url https://www.sciencedirect.com/science/article/pii/S2307410824000464
work_keys_str_mv AT qiaox infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel
AT jiaos infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel
AT lih infantcryclassificationusinganefficientgraphstructureandattentionbasedmodel