End-to-end neuromorphic speech enhancement with PDM microphones

Enhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural n...

Full description

Saved in:
Bibliographic Details
Main Authors: Sidi Yaya Arnaud Yarga, Sean U N Wood
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Neuromorphic Computing and Engineering
Subjects:
Online Access:https://doi.org/10.1088/2634-4386/adf2d4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728604279144448
author Sidi Yaya Arnaud Yarga
Sean U N Wood
author_facet Sidi Yaya Arnaud Yarga
Sean U N Wood
author_sort Sidi Yaya Arnaud Yarga
collection DOAJ
description Enhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural networks, both of which introduce significant preprocessing and computational overhead. In this work, we present PDMDNS, a novel end-to-end neuromorphic framework for real-time speech denoising that directly processes binary pulse density modulation (PDM) microphone output using a spiking neural network, entirely bypassing the conventional PDM-to-PCM conversion and preprocessing stages. PDMDNS simultaneously performs speech enhancement and signal format conversion, leveraging stateless spiking neurons to reduce computational cost while maintaining temporal modeling capabilities. Moreover, when evaluated on a dataset containing noisy signals with SNRs ranging from 20 dB to −5 dB, our system achieves an average improvement of +7 dB in SI-SNR and a +3% gain in STOI. Although this performance is slightly below the current state-of-the-art by less than 1 dB, PDMDNS requires only 33 M-Ops/s, which is nearly 3× fewer operations than the best-performing spiking models. While PDM signals require a trade-off between maximizing precision through high sampling rates and minimizing energy consumption with lower rates, PDMDNS demonstrates robust generalization across varying input sampling rates (−12.5% to +37.5%) without the need for retraining. This flexibility makes it a compelling solution for energy-efficient, low-latency speech processing in embedded and neuromorphic systems.
format Article
id doaj-art-0c36daffda734c4789de9d986cfcfb07
institution DOAJ
issn 2634-4386
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Neuromorphic Computing and Engineering
spelling doaj-art-0c36daffda734c4789de9d986cfcfb072025-08-20T03:09:31ZengIOP PublishingNeuromorphic Computing and Engineering2634-43862025-01-015303400910.1088/2634-4386/adf2d4End-to-end neuromorphic speech enhancement with PDM microphonesSidi Yaya Arnaud Yarga0https://orcid.org/0000-0003-4727-6437Sean U N Wood1https://orcid.org/0000-0002-6821-1619Department of Electrical and Computer Engineering, Université de Sherbrooke , Sherbrooke, QC, CanadaDepartment of Electrical and Computer Engineering, Université de Sherbrooke , Sherbrooke, QC, CanadaEnhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural networks, both of which introduce significant preprocessing and computational overhead. In this work, we present PDMDNS, a novel end-to-end neuromorphic framework for real-time speech denoising that directly processes binary pulse density modulation (PDM) microphone output using a spiking neural network, entirely bypassing the conventional PDM-to-PCM conversion and preprocessing stages. PDMDNS simultaneously performs speech enhancement and signal format conversion, leveraging stateless spiking neurons to reduce computational cost while maintaining temporal modeling capabilities. Moreover, when evaluated on a dataset containing noisy signals with SNRs ranging from 20 dB to −5 dB, our system achieves an average improvement of +7 dB in SI-SNR and a +3% gain in STOI. Although this performance is slightly below the current state-of-the-art by less than 1 dB, PDMDNS requires only 33 M-Ops/s, which is nearly 3× fewer operations than the best-performing spiking models. While PDM signals require a trade-off between maximizing precision through high sampling rates and minimizing energy consumption with lower rates, PDMDNS demonstrates robust generalization across varying input sampling rates (−12.5% to +37.5%) without the need for retraining. This flexibility makes it a compelling solution for energy-efficient, low-latency speech processing in embedded and neuromorphic systems.https://doi.org/10.1088/2634-4386/adf2d4speech denoisingspiking neural networkspulse density modulation
spellingShingle Sidi Yaya Arnaud Yarga
Sean U N Wood
End-to-end neuromorphic speech enhancement with PDM microphones
Neuromorphic Computing and Engineering
speech denoising
spiking neural networks
pulse density modulation
title End-to-end neuromorphic speech enhancement with PDM microphones
title_full End-to-end neuromorphic speech enhancement with PDM microphones
title_fullStr End-to-end neuromorphic speech enhancement with PDM microphones
title_full_unstemmed End-to-end neuromorphic speech enhancement with PDM microphones
title_short End-to-end neuromorphic speech enhancement with PDM microphones
title_sort end to end neuromorphic speech enhancement with pdm microphones
topic speech denoising
spiking neural networks
pulse density modulation
url https://doi.org/10.1088/2634-4386/adf2d4
work_keys_str_mv AT sidiyayaarnaudyarga endtoendneuromorphicspeechenhancementwithpdmmicrophones
AT seanunwood endtoendneuromorphicspeechenhancementwithpdmmicrophones