P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation

Activation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on th...

Full description

Saved in:
Bibliographic Details
Main Authors: Maria Labied, Abdessamad Belangour, Mouad Banane
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11016691/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849336933874925568
author Maria Labied
Abdessamad Belangour
Mouad Banane
author_facet Maria Labied
Abdessamad Belangour
Mouad Banane
author_sort Maria Labied
collection DOAJ
description Activation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on the Darija-C dataset for Moroccan Darija speech translation. Our study begins by evaluating baseline activation functions&#x2014;GELU, Swish, and Mish&#x2014;demonstrating that while GELU is widely used in Transformer-based architectures, it may not be optimal for dialectal speech translation, particularly in low-resource settings. To address this limitation, we introduce Parameterized GELU (P-GELU), a novel activation function that extends GELU by incorporating trainable parameters (<inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>), allowing the model to dynamically adjust its non-linearity across layers and training phases. Through extensive experiments, we demonstrate that P-GELU outperforms standard GELU, with superior convergence speed and generalization. Furthermore, P-GELU reduces training loss, improves feature retention, and enhances linguistic adaptability, making it a more effective alternative for speech translation tasks involving phonetic variability, code-switching, and limited training data. The proposed P-GELU offers a promising balance between computational efficiency and performance gains, presenting a viable solution for enhancing Transformer-based speech models in low-resource language scenarios.
format Article
id doaj-art-38ce2de02e29477aaa898b7477fb431d
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-38ce2de02e29477aaa898b7477fb431d2025-08-20T03:44:51ZengIEEEIEEE Access2169-35362025-01-011310019810021810.1109/ACCESS.2025.357439811016691P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech TranslationMaria Labied0https://orcid.org/0000-0002-1795-4965Abdessamad Belangour1Mouad Banane2Laboratory of Information Technology and Modeling (LTIM), Faculty of Sciences Ben M&#x2019;sik, Hassan II University, Casablanca, MoroccoLaboratory of Information Technology and Modeling (LTIM), Faculty of Sciences Ben M&#x2019;sik, Hassan II University, Casablanca, MoroccoLaboratory of Artificial Intelligence and Complex Systems Engineering, Faculty of Legal, Economic, and Social Sciences, Hassan II University, Casablanca, MoroccoActivation functions play a critical role in optimizing deep learning models, directly influencing gradient flow, convergence stability, and overall translation accuracy. In this work, we investigate their impact within the Whisper-Turbo model, a speech-to-text Transformer trained from scratch on the Darija-C dataset for Moroccan Darija speech translation. Our study begins by evaluating baseline activation functions&#x2014;GELU, Swish, and Mish&#x2014;demonstrating that while GELU is widely used in Transformer-based architectures, it may not be optimal for dialectal speech translation, particularly in low-resource settings. To address this limitation, we introduce Parameterized GELU (P-GELU), a novel activation function that extends GELU by incorporating trainable parameters (<inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>), allowing the model to dynamically adjust its non-linearity across layers and training phases. Through extensive experiments, we demonstrate that P-GELU outperforms standard GELU, with superior convergence speed and generalization. Furthermore, P-GELU reduces training loss, improves feature retention, and enhances linguistic adaptability, making it a more effective alternative for speech translation tasks involving phonetic variability, code-switching, and limited training data. The proposed P-GELU offers a promising balance between computational efficiency and performance gains, presenting a viable solution for enhancing Transformer-based speech models in low-resource language scenarios.https://ieeexplore.ieee.org/document/11016691/Speech translationMoroccan DarijaWhisper-Turboactivation functionsparameterized GELU (P-GELU)low-resource speech processing
spellingShingle Maria Labied
Abdessamad Belangour
Mouad Banane
P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
IEEE Access
Speech translation
Moroccan Darija
Whisper-Turbo
activation functions
parameterized GELU (P-GELU)
low-resource speech processing
title P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
title_full P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
title_fullStr P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
title_full_unstemmed P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
title_short P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation
title_sort p gelu a novel activation function to optimize whisper for darija speech translation
topic Speech translation
Moroccan Darija
Whisper-Turbo
activation functions
parameterized GELU (P-GELU)
low-resource speech processing
url https://ieeexplore.ieee.org/document/11016691/
work_keys_str_mv AT marialabied pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation
AT abdessamadbelangour pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation
AT mouadbanane pgeluanovelactivationfunctiontooptimizewhisperfordarijaspeechtranslation