Spectrogram Features-Based Automatic Speaker Identification For Smart Services

Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution...

Full description

Saved in:
Bibliographic Details
Main Authors: Rashid Jahangir, Mohammed Alreshoodi, Fawaz Khaled Alarfaj
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Applied Artificial Intelligence
Online Access:https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850140106909810688
author Rashid Jahangir
Mohammed Alreshoodi
Fawaz Khaled Alarfaj
author_facet Rashid Jahangir
Mohammed Alreshoodi
Fawaz Khaled Alarfaj
author_sort Rashid Jahangir
collection DOAJ
description Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution neural network (CNN) with rectangular-shaped kernels. Traditionally, CNN employs square-shaped kernel and max-pooling operations at different layers, a design optimized to handle 2D data. Nevertheless, encoding of information differs slightly to deal with spectrograms. The frequency is displayed along the y-axis, and the x-axis presents the time of the audio. Amplitude is denoted by intensity within the spectrogram image at certain point. The main contributions of this study are 1: To analyze audio signals effectively using spectrograms, this study proposed the utilization of spectrogram features with different sizes and shapes of rectangular kernels to derive distinctive features by improving the recognition accuracy of the speaker identification system. 2. The extracted spectrogram-based features and models are evaluated on the ELSDSR, TSP, and LibriSpeech datasets and achieved the weighted accuracy of 96.0%, 99.2%, and 97.6%, respectively. 3. The proposed rectangular-shaped CNN approach effectively derives suitable features from spectrogram images and outperformed several baseline techniques when performance was assessed on ELSDSR, TSP, and LibriSpeech datasets.
format Article
id doaj-art-2d2b9a0530174bf5ad4a37bf56a30c0e
institution OA Journals
issn 0883-9514
1087-6545
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj-art-2d2b9a0530174bf5ad4a37bf56a30c0e2025-08-20T02:29:58ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452025-12-0139110.1080/08839514.2025.2459476Spectrogram Features-Based Automatic Speaker Identification For Smart ServicesRashid Jahangir0Mohammed Alreshoodi1Fawaz Khaled Alarfaj2Department of Computer Science, COMSATS University Islamabad, Vehari, PakistanUnit of Scientific Research, Applied College, Qassim University, Qassim, Saudi ArabiaDepartment of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa, Saudi ArabiaAutomatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution neural network (CNN) with rectangular-shaped kernels. Traditionally, CNN employs square-shaped kernel and max-pooling operations at different layers, a design optimized to handle 2D data. Nevertheless, encoding of information differs slightly to deal with spectrograms. The frequency is displayed along the y-axis, and the x-axis presents the time of the audio. Amplitude is denoted by intensity within the spectrogram image at certain point. The main contributions of this study are 1: To analyze audio signals effectively using spectrograms, this study proposed the utilization of spectrogram features with different sizes and shapes of rectangular kernels to derive distinctive features by improving the recognition accuracy of the speaker identification system. 2. The extracted spectrogram-based features and models are evaluated on the ELSDSR, TSP, and LibriSpeech datasets and achieved the weighted accuracy of 96.0%, 99.2%, and 97.6%, respectively. 3. The proposed rectangular-shaped CNN approach effectively derives suitable features from spectrogram images and outperformed several baseline techniques when performance was assessed on ELSDSR, TSP, and LibriSpeech datasets.https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476
spellingShingle Rashid Jahangir
Mohammed Alreshoodi
Fawaz Khaled Alarfaj
Spectrogram Features-Based Automatic Speaker Identification For Smart Services
Applied Artificial Intelligence
title Spectrogram Features-Based Automatic Speaker Identification For Smart Services
title_full Spectrogram Features-Based Automatic Speaker Identification For Smart Services
title_fullStr Spectrogram Features-Based Automatic Speaker Identification For Smart Services
title_full_unstemmed Spectrogram Features-Based Automatic Speaker Identification For Smart Services
title_short Spectrogram Features-Based Automatic Speaker Identification For Smart Services
title_sort spectrogram features based automatic speaker identification for smart services
url https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476
work_keys_str_mv AT rashidjahangir spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices
AT mohammedalreshoodi spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices
AT fawazkhaledalarfaj spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices