Auditory attention decoding based on neural-network for binaural beamforming applications

Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has...

Full description

Saved in:

Bibliographic Details
Main Authors:	Roy Gueta, Elana Zion-Golumbic, Jacob Goldberger, Sharon Gannot
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-01-01
Series:	Frontiers in Signal Processing
Subjects:	audio attention decoding EEG signals multi-microphone processing binaural LCMV beamformer neural network based AAD
Online Access:	https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832582935665967104
author	Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot
author_facet	Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot
author_sort	Roy Gueta
collection	DOAJ
description	Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.
format	Article
id	doaj-art-30517d58c808474fad2d6d5fe99b174c
institution	Kabale University
issn	2673-8198
language	English
publishDate	2025-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Signal Processing
spelling	doaj-art-30517d58c808474fad2d6d5fe99b174c2025-01-29T06:45:40ZengFrontiers Media S.A.Frontiers in Signal Processing2673-81982025-01-01410.3389/frsip.2024.14322981432298Auditory attention decoding based on neural-network for binaural beamforming applicationsRoy Gueta0Elana Zion-Golumbic1Jacob Goldberger2Sharon Gannot3Faculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelMultidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelFaculty of Engineering, Bar-Ilan University, Ramat-Gan, IsraelIndividuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITFD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/fullaudio attention decodingEEG signalsmulti-microphone processingbinaural LCMV beamformerneural network based AAD
spellingShingle	Roy Gueta Elana Zion-Golumbic Jacob Goldberger Sharon Gannot Auditory attention decoding based on neural-network for binaural beamforming applications Frontiers in Signal Processing audio attention decoding EEG signals multi-microphone processing binaural LCMV beamformer neural network based AAD
title	Auditory attention decoding based on neural-network for binaural beamforming applications
title_full	Auditory attention decoding based on neural-network for binaural beamforming applications
title_fullStr	Auditory attention decoding based on neural-network for binaural beamforming applications
title_full_unstemmed	Auditory attention decoding based on neural-network for binaural beamforming applications
title_short	Auditory attention decoding based on neural-network for binaural beamforming applications
title_sort	auditory attention decoding based on neural network for binaural beamforming applications
topic	audio attention decoding EEG signals multi-microphone processing binaural LCMV beamformer neural network based AAD
url	https://www.frontiersin.org/articles/10.3389/frsip.2024.1432298/full
work_keys_str_mv	AT roygueta auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT elanaziongolumbic auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT jacobgoldberger auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications AT sharongannot auditoryattentiondecodingbasedonneuralnetworkforbinauralbeamformingapplications

Auditory attention decoding based on neural-network for binaural beamforming applications

Similar Items