Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer

Replay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALR...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael Neri, Tuomas Virtanen
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10994395/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849715201844183040
author Michael Neri
Tuomas Virtanen
author_facet Michael Neri
Tuomas Virtanen
author_sort Michael Neri
collection DOAJ
description Replay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALRAD for the detection of replay attacks based on spatial audio features. This approach integrates a learnable adaptive beamformer with a convolutional recurrent neural network, allowing for joint optimization of spatial filtering and classification. Experiments have been carried out on the ReMASC dataset, which is a state-of-the-art multi-channel replay speech detection dataset encompassing four microphones with diverse array configurations and four environments. Results on the ReMASC dataset show the superiority of the approach compared to the state-of-the-art and yield substantial improvements for challenging acoustic environments. In addition, we demonstrate that our approach is able to better generalize to unseen environments with respect to prior studies.
format Article
id doaj-art-a843309f1f2e4c4e89cf545a0bbafc3d
institution DOAJ
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-a843309f1f2e4c4e89cf545a0bbafc3d2025-08-20T03:13:29ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01653053510.1109/OJSP.2025.356875810994395Multi-Channel Replay Speech Detection Using an Adaptive Learnable BeamformerMichael Neri0https://orcid.org/0000-0002-6212-9139Tuomas Virtanen1https://orcid.org/0000-0002-4604-9729Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, FinlandFaculty of Information Technology and Communication Sciences, Tampere University, Tampere, FinlandReplay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALRAD for the detection of replay attacks based on spatial audio features. This approach integrates a learnable adaptive beamformer with a convolutional recurrent neural network, allowing for joint optimization of spatial filtering and classification. Experiments have been carried out on the ReMASC dataset, which is a state-of-the-art multi-channel replay speech detection dataset encompassing four microphones with diverse array configurations and four environments. Results on the ReMASC dataset show the superiority of the approach compared to the state-of-the-art and yield substantial improvements for challenging acoustic environments. In addition, we demonstrate that our approach is able to better generalize to unseen environments with respect to prior studies.https://ieeexplore.ieee.org/document/10994395/Replay attackphysical accessbeamformingspatial audiovoice anti-spoofing
spellingShingle Michael Neri
Tuomas Virtanen
Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
IEEE Open Journal of Signal Processing
Replay attack
physical access
beamforming
spatial audio
voice anti-spoofing
title Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
title_full Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
title_fullStr Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
title_full_unstemmed Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
title_short Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer
title_sort multi channel replay speech detection using an adaptive learnable beamformer
topic Replay attack
physical access
beamforming
spatial audio
voice anti-spoofing
url https://ieeexplore.ieee.org/document/10994395/
work_keys_str_mv AT michaelneri multichannelreplayspeechdetectionusinganadaptivelearnablebeamformer
AT tuomasvirtanen multichannelreplayspeechdetectionusinganadaptivelearnablebeamformer