The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture

This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these env...

Full description

Saved in:

Bibliographic Details
Main Authors:	Weijun Pan, Shenhao Chen, Yidi Wang, Sheng Chen, Xuan Wang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Applied Sciences
Subjects:	speaker identification parallel branch architecture air-ground communication
Online Access:	https://www.mdpi.com/2076-3417/15/6/2994
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850089591512498176
author	Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang
author_facet	Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang
author_sort	Weijun Pan
collection	DOAJ
description	This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications.
format	Article
id	doaj-art-03d91ea8ff7543a2ae08f9845cece743
institution	DOAJ
issn	2076-3417
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-03d91ea8ff7543a2ae08f9845cece7432025-08-20T02:42:45ZengMDPI AGApplied Sciences2076-34172025-03-01156299410.3390/app15062994The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch ArchitectureWeijun Pan0Shenhao Chen1Yidi Wang2Sheng Chen3Xuan Wang4College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaThis study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications.https://www.mdpi.com/2076-3417/15/6/2994speaker identificationparallel branch architectureair-ground communication
spellingShingle	Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture Applied Sciences speaker identification parallel branch architecture air-ground communication
title	The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_full	The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_fullStr	The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_full_unstemmed	The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_short	The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_sort	speaker identification model for air ground communication based on a parallel branch architecture
topic	speaker identification parallel branch architecture air-ground communication
url	https://www.mdpi.com/2076-3417/15/6/2994
work_keys_str_mv	AT weijunpan thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shenhaochen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT yidiwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shengchen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT xuanwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT weijunpan speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shenhaochen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT yidiwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shengchen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT xuanwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture

The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture

Similar Items