SARS-CoV-2 Detection From Voice

Automated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded...

Full description

Saved in:
Bibliographic Details
Main Authors: Gadi Pinkas, Yarden Karny, Aviad Malachi, Galia Barkai, Gideon Bachar, Vered Aharonson
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Open Journal of Engineering in Medicine and Biology
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9205643/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849417512609906688
author Gadi Pinkas
Yarden Karny
Aviad Malachi
Galia Barkai
Gideon Bachar
Vered Aharonson
author_facet Gadi Pinkas
Yarden Karny
Aviad Malachi
Galia Barkai
Gideon Bachar
Vered Aharonson
author_sort Gadi Pinkas
collection DOAJ
description Automated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded by the subjects in either hospitals or isolation sites. All subjects underwent nasopharyngeal swabbing at the time of recording and were labelled as SARS-CoV-2 positives or negative controls. The present study harnessed deep machine learning and speech processing to detect the SARS-CoV-2 positives. A three-stage architecture was implemented. A self-supervised attention-based transformer generated embeddings from the audio inputs. Recurrent neural networks were used to produce specialized sub-models for the SARS-CoV-2 classification. An ensemble stacking fused the predictions of the sub-models. Pre-training, bootstrapping and regularization techniques were used to prevent overfitting. A recall of 78% and a probability of false alarm (PFA) of 41% were measured on a test set of 57 recording sessions. A leave-one-speaker-out cross validation on 292 recording sessions yielded a recall of 78% and a PFA of 30%. These preliminary results imply a feasibility for COVID19 screening using voice.
format Article
id doaj-art-ceca39267f21438cbb216bfe599ccc0e
institution Kabale University
issn 2644-1276
language English
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Engineering in Medicine and Biology
spelling doaj-art-ceca39267f21438cbb216bfe599ccc0e2025-08-20T03:32:47ZengIEEEIEEE Open Journal of Engineering in Medicine and Biology2644-12762020-01-01126827410.1109/OJEMB.2020.30264689205643SARS-CoV-2 Detection From VoiceGadi Pinkas0Yarden Karny1Aviad Malachi2Galia Barkai3https://orcid.org/0000-0002-0775-9015Gideon Bachar4Vered Aharonson5https://orcid.org/0000-0002-4406-6525Afeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, IsraelAfeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, IsraelAfeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, IsraelPediatric Infectious Diseases Unit, Safra Children's Hospital, Sheba Medical Center and Sackler School of Medicine, Tel-Aviv University, Tel Aviv-Yafo, IsraelDepartment of Otorhinolaryngology, Rabin Medical center and Sackler School of Medicine, Tel-Aviv University, Tel Aviv-Yafo, IsraelAfeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, IsraelAutomated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded by the subjects in either hospitals or isolation sites. All subjects underwent nasopharyngeal swabbing at the time of recording and were labelled as SARS-CoV-2 positives or negative controls. The present study harnessed deep machine learning and speech processing to detect the SARS-CoV-2 positives. A three-stage architecture was implemented. A self-supervised attention-based transformer generated embeddings from the audio inputs. Recurrent neural networks were used to produce specialized sub-models for the SARS-CoV-2 classification. An ensemble stacking fused the predictions of the sub-models. Pre-training, bootstrapping and regularization techniques were used to prevent overfitting. A recall of 78% and a probability of false alarm (PFA) of 41% were measured on a test set of 57 recording sessions. A leave-one-speaker-out cross validation on 292 recording sessions yielded a recall of 78% and a PFA of 30%. These preliminary results imply a feasibility for COVID19 screening using voice.https://ieeexplore.ieee.org/document/9205643/COVID19audio embeddingstransformerrecurrent neural networkensemble stackingsemi supervised learning
spellingShingle Gadi Pinkas
Yarden Karny
Aviad Malachi
Galia Barkai
Gideon Bachar
Vered Aharonson
SARS-CoV-2 Detection From Voice
IEEE Open Journal of Engineering in Medicine and Biology
COVID19
audio embeddings
transformer
recurrent neural network
ensemble stacking
semi supervised learning
title SARS-CoV-2 Detection From Voice
title_full SARS-CoV-2 Detection From Voice
title_fullStr SARS-CoV-2 Detection From Voice
title_full_unstemmed SARS-CoV-2 Detection From Voice
title_short SARS-CoV-2 Detection From Voice
title_sort sars cov 2 detection from voice
topic COVID19
audio embeddings
transformer
recurrent neural network
ensemble stacking
semi supervised learning
url https://ieeexplore.ieee.org/document/9205643/
work_keys_str_mv AT gadipinkas sarscov2detectionfromvoice
AT yardenkarny sarscov2detectionfromvoice
AT aviadmalachi sarscov2detectionfromvoice
AT galiabarkai sarscov2detectionfromvoice
AT gideonbachar sarscov2detectionfromvoice
AT veredaharonson sarscov2detectionfromvoice