The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture
This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these env...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/6/2994 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850089591512498176 |
|---|---|
| author | Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang |
| author_facet | Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang |
| author_sort | Weijun Pan |
| collection | DOAJ |
| description | This study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications. |
| format | Article |
| id | doaj-art-03d91ea8ff7543a2ae08f9845cece743 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-03d91ea8ff7543a2ae08f9845cece7432025-08-20T02:42:45ZengMDPI AGApplied Sciences2076-34172025-03-01156299410.3390/app15062994The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch ArchitectureWeijun Pan0Shenhao Chen1Yidi Wang2Sheng Chen3Xuan Wang4College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaCollege of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, ChinaThis study addresses the challenges of complex noise and short speech in civil aviation air-ground communication scenarios and proposes a novel speaker identification model, Chrono-ECAPA-TDNN (CET). The aim of the study is to enhance the accuracy and robustness of speaker identification in these environments. The CET model incorporates three key components: the Chrono Block module, the speaker embedding extraction module, and the optimized loss function module. The Chrono Block module utilizes parallel branching architecture, Bi-LSTM, and multi-head attention mechanisms to effectively extract both global and local features, addressing the challenge of short speech. The speaker embedding extraction module aggregates features from the Chrono Block and employs self-attention statistical pooling to generate robust speaker embeddings. The loss function module introduces the Sub-center AAM-Softmax loss, which improves feature compactness and class separation. To further improve robustness, data augmentation techniques such as speed perturbation, spectral masking, and random noise suppression are applied. Pretraining on the VoxCeleb2 dataset and testing on the air-ground communication dataset, the CET model achieves 9.81% EER and 88.62% accuracy, outperforming the baseline ECAPA-TDNN model by 1.53% in EER and 2.19% in accuracy. The model also demonstrates strong performance on four cross-domain datasets, highlighting its broad potential for real-time applications.https://www.mdpi.com/2076-3417/15/6/2994speaker identificationparallel branch architectureair-ground communication |
| spellingShingle | Weijun Pan Shenhao Chen Yidi Wang Sheng Chen Xuan Wang The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture Applied Sciences speaker identification parallel branch architecture air-ground communication |
| title | The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture |
| title_full | The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture |
| title_fullStr | The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture |
| title_full_unstemmed | The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture |
| title_short | The Speaker Identification Model for Air-Ground Communication Based on a Parallel Branch Architecture |
| title_sort | speaker identification model for air ground communication based on a parallel branch architecture |
| topic | speaker identification parallel branch architecture air-ground communication |
| url | https://www.mdpi.com/2076-3417/15/6/2994 |
| work_keys_str_mv | AT weijunpan thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shenhaochen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT yidiwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shengchen thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT xuanwang thespeakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT weijunpan speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shenhaochen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT yidiwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT shengchen speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture AT xuanwang speakeridentificationmodelforairgroundcommunicationbasedonaparallelbrancharchitecture |