Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions

Service delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized vi...

Full description

Saved in:
Bibliographic Details
Main Authors: Rana Zeeshan, John Bogue, Mamoona Naveed Asghar
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11063333/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427521684111360
author Rana Zeeshan
John Bogue
Mamoona Naveed Asghar
author_facet Rana Zeeshan
John Bogue
Mamoona Naveed Asghar
author_sort Rana Zeeshan
collection DOAJ
description Service delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized via Automatic Speech Recognition (ASR) based Speech-to-Text (STT) transcription enables progressive analysis of mental health cases. The ASR applications usually require audio recording prior to the transcription, for labeling speakers or diarization. Although such models are good enough for most use cases, storing audio recordings in psychiatry complicates the data handling and adoption of ASR platforms in mental healthcare. This study involved a two-stage methodology, where at first, a list of 32 well-reputed STT transcription tools were evaluated in terms of applicability in psychiatry; followed by experimental testing using nine audio clips derived from three psychiatric session recordings of varying durations (1, 3, and 10 minutes) and speakers’ gender. Metrics such as inference time, Word Error Rate (WER), and Diarization Error Rate (DER) were analyzed. The results indicated that while WER was positively low (0-7%), DER varied significantly (2-32%), influenced by the audio length and speaker characteristics. DER was notably lower for clips with speakers of differing genders or ages, but negatively increased for speakers of similar demographics. The study also compared synchronous and asynchronous diarization approaches, highlighting challenges in accuracy, privacy, and processing efficiency in psychiatry. These findings provide actionable insights for selecting ASR tools in mental healthcare and underscore the need for targeted improvements in ASR technology to address the unique demands of this field.
format Article
id doaj-art-ca2bd7e43b514331ba4c55e309a8f5fa
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-ca2bd7e43b514331ba4c55e309a8f5fa2025-08-20T03:28:59ZengIEEEIEEE Access2169-35362025-01-011311734311735410.1109/ACCESS.2025.358545411063333Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment SessionsRana Zeeshan0https://orcid.org/0009-0003-7384-7253John Bogue1https://orcid.org/0000-0002-7070-1561Mamoona Naveed Asghar2https://orcid.org/0000-0001-7460-266XCollege of Science and Engineering, University of Galway, Galway, IrelandSchool of Psychology, University of Galway, Galway, IrelandCollege of Science and Engineering, University of Galway, Galway, IrelandService delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized via Automatic Speech Recognition (ASR) based Speech-to-Text (STT) transcription enables progressive analysis of mental health cases. The ASR applications usually require audio recording prior to the transcription, for labeling speakers or diarization. Although such models are good enough for most use cases, storing audio recordings in psychiatry complicates the data handling and adoption of ASR platforms in mental healthcare. This study involved a two-stage methodology, where at first, a list of 32 well-reputed STT transcription tools were evaluated in terms of applicability in psychiatry; followed by experimental testing using nine audio clips derived from three psychiatric session recordings of varying durations (1, 3, and 10 minutes) and speakers’ gender. Metrics such as inference time, Word Error Rate (WER), and Diarization Error Rate (DER) were analyzed. The results indicated that while WER was positively low (0-7%), DER varied significantly (2-32%), influenced by the audio length and speaker characteristics. DER was notably lower for clips with speakers of differing genders or ages, but negatively increased for speakers of similar demographics. The study also compared synchronous and asynchronous diarization approaches, highlighting challenges in accuracy, privacy, and processing efficiency in psychiatry. These findings provide actionable insights for selecting ASR tools in mental healthcare and underscore the need for targeted improvements in ASR technology to address the unique demands of this field.https://ieeexplore.ieee.org/document/11063333/Automatic speech recognition toolsdata privacymental health servicesspeaker diarization toolspsychiatry case notesspeech to text transcription tools
spellingShingle Rana Zeeshan
John Bogue
Mamoona Naveed Asghar
Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
IEEE Access
Automatic speech recognition tools
data privacy
mental health services
speaker diarization tools
psychiatry case notes
speech to text transcription tools
title Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
title_full Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
title_fullStr Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
title_full_unstemmed Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
title_short Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
title_sort relative applicability of diverse automatic speech recognition platforms for transcription of psychiatric treatment sessions
topic Automatic speech recognition tools
data privacy
mental health services
speaker diarization tools
psychiatry case notes
speech to text transcription tools
url https://ieeexplore.ieee.org/document/11063333/
work_keys_str_mv AT ranazeeshan relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions
AT johnbogue relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions
AT mamoonanaveedasghar relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions