Auditory attention decoding based on neural-network for binaural beamforming applications
Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Signal Processing |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832582935665967104 |
---|---|
author | Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot |
author_facet | Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot |
author_sort | Roy Gueta |
collection | DOAJ |
description | Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed. |
format | Article |
id | doaj-art-30517d58c808474fad2d6d5fe99b174c |
institution | Kabale University |
issn | 2673-8198 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Signal Processing |
spelling | doaj-art-30517d58c808474fad2d6d5fe99b174c2025-01-29T06:45:40ZengFrontiers Media S.A.Frontiers in Signal Processing2673-81982025-01-01410.3389/frsip.2024.14322981432298Auditory attention decoding based on neural-network for binaural beamforming applicationsRoy Gueta0Elana Zion-Golumbic1Jacob Goldberger2Sharon Gannot3Faculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelMultidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelIndividuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/fullaudio attention decodingEEG signalsmulti-microphone processingbinaural LCMV beamformerneural network based AAD |
spellingShingle | Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot Auditory attention decoding based on neural-network for binaural beamforming applications Frontiers in Signal Processing audio attention decoding EEG signals multi-microphone processing binaural LCMV beamformer neural network based AAD |
title | Auditory attention decoding based on neural-network for binaural beamforming applications |
title_full | Auditory attention decoding based on neural-network for binaural beamforming applications |
title_fullStr | Auditory attention decoding based on neural-network for binaural beamforming applications |
title_full_unstemmed | Auditory attention decoding based on neural-network for binaural beamforming applications |
title_short | Auditory attention decoding based on neural-network for binaural beamforming applications |
title_sort | auditory attention decoding based on neural network for binaural beamforming applications |
topic | audio attention decoding EEG signals multi-microphone processing binaural LCMV beamformer neural network based AAD |
url | https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full |
work_keys_str_mv | AT roygueta auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT elanaziongolumbic auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT jacobgoldberger auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT sharongannot auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications |