End-to-end neuromorphic speech enhancement with PDM microphones
Enhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural n...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | Neuromorphic Computing and Engineering |
| Subjects: | |
| Online Access: | https://doi.org/10.1088/2634-4386/adf2d4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849728604279144448 |
|---|---|
| author | Sidi Yaya Arnaud Yarga Sean U N Wood |
| author_facet | Sidi Yaya Arnaud Yarga Sean U N Wood |
| author_sort | Sidi Yaya Arnaud Yarga |
| collection | DOAJ |
| description | Enhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural networks, both of which introduce significant preprocessing and computational overhead. In this work, we present PDMDNS, a novel end-to-end neuromorphic framework for real-time speech denoising that directly processes binary pulse density modulation (PDM) microphone output using a spiking neural network, entirely bypassing the conventional PDM-to-PCM conversion and preprocessing stages. PDMDNS simultaneously performs speech enhancement and signal format conversion, leveraging stateless spiking neurons to reduce computational cost while maintaining temporal modeling capabilities. Moreover, when evaluated on a dataset containing noisy signals with SNRs ranging from 20 dB to −5 dB, our system achieves an average improvement of +7 dB in SI-SNR and a +3% gain in STOI. Although this performance is slightly below the current state-of-the-art by less than 1 dB, PDMDNS requires only 33 M-Ops/s, which is nearly 3× fewer operations than the best-performing spiking models. While PDM signals require a trade-off between maximizing precision through high sampling rates and minimizing energy consumption with lower rates, PDMDNS demonstrates robust generalization across varying input sampling rates (−12.5% to +37.5%) without the need for retraining. This flexibility makes it a compelling solution for energy-efficient, low-latency speech processing in embedded and neuromorphic systems. |
| format | Article |
| id | doaj-art-0c36daffda734c4789de9d986cfcfb07 |
| institution | DOAJ |
| issn | 2634-4386 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | Neuromorphic Computing and Engineering |
| spelling | doaj-art-0c36daffda734c4789de9d986cfcfb072025-08-20T03:09:31ZengIOP PublishingNeuromorphic Computing and Engineering2634-43862025-01-015303400910.1088/2634-4386/adf2d4End-to-end neuromorphic speech enhancement with PDM microphonesSidi Yaya Arnaud Yarga0https://orcid.org/0000-0003-4727-6437Sean U N Wood1https://orcid.org/0000-0002-6821-1619Department of Electrical and Computer Engineering, Université de Sherbrooke , Sherbrooke, QC, CanadaDepartment of Electrical and Computer Engineering, Université de Sherbrooke , Sherbrooke, QC, CanadaEnhancing speech in noisy environments is essential for applications like automatic speech recognition, hearing aids, and real-time voice interfaces, but remains challenging on low-power, always-on edge devices. Conventional systems rely on pulse code modulation (PCM) signals and artificial neural networks, both of which introduce significant preprocessing and computational overhead. In this work, we present PDMDNS, a novel end-to-end neuromorphic framework for real-time speech denoising that directly processes binary pulse density modulation (PDM) microphone output using a spiking neural network, entirely bypassing the conventional PDM-to-PCM conversion and preprocessing stages. PDMDNS simultaneously performs speech enhancement and signal format conversion, leveraging stateless spiking neurons to reduce computational cost while maintaining temporal modeling capabilities. Moreover, when evaluated on a dataset containing noisy signals with SNRs ranging from 20 dB to −5 dB, our system achieves an average improvement of +7 dB in SI-SNR and a +3% gain in STOI. Although this performance is slightly below the current state-of-the-art by less than 1 dB, PDMDNS requires only 33 M-Ops/s, which is nearly 3× fewer operations than the best-performing spiking models. While PDM signals require a trade-off between maximizing precision through high sampling rates and minimizing energy consumption with lower rates, PDMDNS demonstrates robust generalization across varying input sampling rates (−12.5% to +37.5%) without the need for retraining. This flexibility makes it a compelling solution for energy-efficient, low-latency speech processing in embedded and neuromorphic systems.https://doi.org/10.1088/2634-4386/adf2d4speech denoisingspiking neural networkspulse density modulation |
| spellingShingle | Sidi Yaya Arnaud Yarga Sean U N Wood End-to-end neuromorphic speech enhancement with PDM microphones Neuromorphic Computing and Engineering speech denoising spiking neural networks pulse density modulation |
| title | End-to-end neuromorphic speech enhancement with PDM microphones |
| title_full | End-to-end neuromorphic speech enhancement with PDM microphones |
| title_fullStr | End-to-end neuromorphic speech enhancement with PDM microphones |
| title_full_unstemmed | End-to-end neuromorphic speech enhancement with PDM microphones |
| title_short | End-to-end neuromorphic speech enhancement with PDM microphones |
| title_sort | end to end neuromorphic speech enhancement with pdm microphones |
| topic | speech denoising spiking neural networks pulse density modulation |
| url | https://doi.org/10.1088/2634-4386/adf2d4 |
| work_keys_str_mv | AT sidiyayaarnaudyarga endtoendneuromorphicspeechenhancementwithpdmmicrophones AT seanunwood endtoendneuromorphicspeechenhancementwithpdmmicrophones |