P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
Activation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on th...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11016691/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849336933874925568 |
|---|---|
| author | Maria Labied Abdessamad Belangour Mouad Banane |
| author_facet | Maria Labied Abdessamad Belangour Mouad Banane |
| author_sort | Maria Labied |
| collection | DOAJ |
| description | Activation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on the Darija-C dataset for Moroccan Darija speech translation. Our study begins by evaluating baseline activation functions—GELU, Swish, and Mish—demonstrating that while GELU is widely used in Transformer-based architectures, it may not be optimal for dialectal speech translation, particularly in low-resource settings. To address this limitation, we introduce Parameterized GELU (P-GELU), a novel activation function that extends GELU by incorporating trainable parameters (<inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>), allowing the model to dynamically adjust its non-linearity across layers and training phases. Through extensive experiments, we demonstrate that P-GELU outperforms standard GELU, with superior convergence speed and generalization. Furthermore, P-GELU reduces training loss, improves feature retention, and enhances linguistic adaptability, making it a more effective alternative for speech translation tasks involving phonetic variability, code-switching, and limited training data. The proposed P-GELU offers a promising balance between computational efficiency and performance gains, presenting a viable solution for enhancing Transformer-based speech models in low-resource language scenarios. |
| format | Article |
| id | doaj-art-38ce2de02e29477aaa898b7477fb431d |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-38ce2de02e29477aaa898b7477fb431d2025-08-20T03:44:51ZengIEEEIEEE Access2169-35362025-01-011310019810021810.1109/ACCESS.2025.357439811016691P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech TranslationMaria Labied0https://orcid.org/0000-0002-1795-4965Abdessamad Belangour1Mouad Banane2Laboratory of Information Technology and Modeling (LTIM), Faculty of Sciences Ben M’sik, Hassan II University, Casablanca, MoroccoLaboratory of Information Technology and Modeling (LTIM), Faculty of Sciences Ben M’sik, Hassan II University, Casablanca, MoroccoLaboratory of Artificial Intelligence and Complex Systems Engineering, Faculty of Legal, Economic, and Social Sciences, Hassan II University, Casablanca, MoroccoActivation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on the Darija-C dataset for Moroccan Darija speech translation. Our study begins by evaluating baseline activation functions—GELU, Swish, and Mish—demonstrating that while GELU is widely used in Transformer-based architectures, it may not be optimal for dialectal speech translation, particularly in low-resource settings. To address this limitation, we introduce Parameterized GELU (P-GELU), a novel activation function that extends GELU by incorporating trainable parameters (<inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>), allowing the model to dynamically adjust its non-linearity across layers and training phases. Through extensive experiments, we demonstrate that P-GELU outperforms standard GELU, with superior convergence speed and generalization. Furthermore, P-GELU reduces training loss, improves feature retention, and enhances linguistic adaptability, making it a more effective alternative for speech translation tasks involving phonetic variability, code-switching, and limited training data. The proposed P-GELU offers a promising balance between computational efficiency and performance gains, presenting a viable solution for enhancing Transformer-based speech models in low-resource language scenarios.https://ieeexplore.ieee.org/document/11016691/Speech translationMoroccan DarijaWhisper-Turboactivation functionsparameterized GELU (P-GELU)low-resource speech processing |
| spellingShingle | Maria Labied Abdessamad Belangour Mouad Banane P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation IEEE Access Speech translation Moroccan Darija Whisper-Turbo activation functions parameterized GELU (P-GELU) low-resource speech processing |
| title | P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation |
| title_full | P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation |
| title_fullStr | P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation |
| title_full_unstemmed | P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation |
| title_short | P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation |
| title_sort | p gelu a novel activation function to optimize whisper for darija speech translation |
| topic | Speech translation Moroccan Darija Whisper-Turbo activation functions parameterized GELU (P-GELU) low-resource speech processing |
| url | https://ieeexplore.ieee.org/document/11016691/ |
| work_keys_str_mv | AT marialabied pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation AT abdessamadbelangour pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation AT mouadbanane pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation |