Auditory attention decoding based on neural-network for binaural beamforming applications

Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has...

Full description

Saved in:
Bibliographic Details
Main Authors: Roy Gueta, Elana Zion-Golumbic, Jacob Goldberger, Sharon Gannot
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Signal Processing
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832582935665967104
author Roy Gueta
Elana Zion-Golumbic
Jacob Goldberger
Sharon Gannot
author_facet Roy Gueta
Elana Zion-Golumbic
Jacob Goldberger
Sharon Gannot
author_sort Roy Gueta
collection DOAJ
description Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.
format Article
id doaj-art-30517d58c808474fad2d6d5fe99b174c
institution Kabale University
issn 2673-8198
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Signal Processing
spelling doaj-art-30517d58c808474fad2d6d5fe99b174c2025-01-29T06:45:40ZengFrontiers Media S.A.Frontiers in Signal Processing2673-81982025-01-01410.3389/frsip.2024.14322981432298Auditory attention decoding based on neural-network for binaural beamforming applicationsRoy Gueta0Elana Zion-Golumbic1Jacob Goldberger2Sharon Gannot3Faculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelMultidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelIndividuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/fullaudio attention decodingEEG signalsmulti-microphone processingbinaural LCMV beamformerneural network based AAD
spellingShingle Roy Gueta
Elana Zion-Golumbic
Jacob Goldberger
Sharon Gannot
Auditory attention decoding based on neural-network for binaural beamforming applications
Frontiers in Signal Processing
audio attention decoding
EEG signals
multi-microphone processing
binaural LCMV beamformer
neural network based AAD
title Auditory attention decoding based on neural-network for binaural beamforming applications
title_full Auditory attention decoding based on neural-network for binaural beamforming applications
title_fullStr Auditory attention decoding based on neural-network for binaural beamforming applications
title_full_unstemmed Auditory attention decoding based on neural-network for binaural beamforming applications
title_short Auditory attention decoding based on neural-network for binaural beamforming applications
title_sort auditory attention decoding based on neural network for binaural beamforming applications
topic audio attention decoding
EEG signals
multi-microphone processing
binaural LCMV beamformer
neural network based AAD
url https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full
work_keys_str_mv AT roygueta auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications
AT elanaziongolumbic auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications
AT jacobgoldberger auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications
AT sharongannot auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications