The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture

This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these env...

Full description

Saved in:
Bibliographic Details
Main Authors: Weijun Pan, Shenhao Chen, Yidi Wang, Sheng Chen, Xuan Wang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/6/2994
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850089591512498176
author Weijun Pan
Shenhao Chen
Yidi Wang
Sheng Chen
Xuan Wang
author_facet Weijun Pan
Shenhao Chen
Yidi Wang
Sheng Chen
Xuan Wang
author_sort Weijun Pan
collection DOAJ
description This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications.
format Article
id doaj-art-03d91ea8ff7543a2ae08f9845cece743
institution DOAJ
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-03d91ea8ff7543a2ae08f9845cece7432025-08-20T02:42:45ZengMDPI AGApplied Sciences2076-34172025-03-01156299410.3390/app15062994The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch ArchitectureWeijun Pan0Shenhao Chen1Yidi Wang2Sheng Chen3Xuan Wang4College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaThis study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications.https://www.mdpi.com/2076-3417/15/6/2994speaker identificationparallel branch architectureair-ground communication
spellingShingle Weijun Pan
Shenhao Chen
Yidi Wang
Sheng Chen
Xuan Wang
The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
Applied Sciences
speaker identification
parallel branch architecture
air-ground communication
title The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_full The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_fullStr The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_full_unstemmed The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_short The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
title_sort speaker identification model for air ground communication based on a parallel branch architecture
topic speaker identification
parallel branch architecture
air-ground communication
url https://www.mdpi.com/2076-3417/15/6/2994
work_keys_str_mv AT weijunpan thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT shenhaochen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT yidiwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT shengchen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT xuanwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT weijunpan speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT shenhaochen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT yidiwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT shengchen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture
AT xuanwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture