An Ensemble of Convolutional Neural Networks for Sound Event Detection

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and qui...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdinabi Mukhamadiyev, Ilyos Khujayarov, Dilorom Nabieva, Jinsoo Cho
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/9/1502
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850032041211461632
author Abdinabi Mukhamadiyev
Ilyos Khujayarov
Dilorom Nabieva
Jinsoo Cho
author_facet Abdinabi Mukhamadiyev
Ilyos Khujayarov
Dilorom Nabieva
Jinsoo Cho
author_sort Abdinabi Mukhamadiyev
collection DOAJ
description Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order.
format Article
id doaj-art-2c5eea6e529e48849304c4aebbb71533
institution DOAJ
issn 2227-7390
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-2c5eea6e529e48849304c4aebbb715332025-08-20T02:58:47ZengMDPI AGMathematics2227-73902025-05-01139150210.3390/math13091502An Ensemble of Convolutional Neural Networks for Sound Event DetectionAbdinabi Mukhamadiyev0Ilyos Khujayarov1Dilorom Nabieva2Jinsoo Cho3Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of KoreaDepartment of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named After Muhammad al-Khwarizmi, Tashkent 100084, UzbekistanDepartment of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named After Muhammad al-Khwarizmi, Tashkent 100084, UzbekistanDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of KoreaSound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order.https://www.mdpi.com/2227-7390/13/9/1502smart citysound event detectionaudio signaldata augmentationensemble of classifierspattern recognition
spellingShingle Abdinabi Mukhamadiyev
Ilyos Khujayarov
Dilorom Nabieva
Jinsoo Cho
An Ensemble of Convolutional Neural Networks for Sound Event Detection
Mathematics
smart city
sound event detection
audio signal
data augmentation
ensemble of classifiers
pattern recognition
title An Ensemble of Convolutional Neural Networks for Sound Event Detection
title_full An Ensemble of Convolutional Neural Networks for Sound Event Detection
title_fullStr An Ensemble of Convolutional Neural Networks for Sound Event Detection
title_full_unstemmed An Ensemble of Convolutional Neural Networks for Sound Event Detection
title_short An Ensemble of Convolutional Neural Networks for Sound Event Detection
title_sort ensemble of convolutional neural networks for sound event detection
topic smart city
sound event detection
audio signal
data augmentation
ensemble of classifiers
pattern recognition
url https://www.mdpi.com/2227-7390/13/9/1502
work_keys_str_mv AT abdinabimukhamadiyev anensembleofconvolutionalneuralnetworksforsoundeventdetection
AT ilyoskhujayarov anensembleofconvolutionalneuralnetworksforsoundeventdetection
AT diloromnabieva anensembleofconvolutionalneuralnetworksforsoundeventdetection
AT jinsoocho anensembleofconvolutionalneuralnetworksforsoundeventdetection
AT abdinabimukhamadiyev ensembleofconvolutionalneuralnetworksforsoundeventdetection
AT ilyoskhujayarov ensembleofconvolutionalneuralnetworksforsoundeventdetection
AT diloromnabieva ensembleofconvolutionalneuralnetworksforsoundeventdetection
AT jinsoocho ensembleofconvolutionalneuralnetworksforsoundeventdetection