Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
Service delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized vi...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11063333/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849427521684111360 |
|---|---|
| author | Rana Zeeshan John Bogue Mamoona Naveed Asghar |
| author_facet | Rana Zeeshan John Bogue Mamoona Naveed Asghar |
| author_sort | Rana Zeeshan |
| collection | DOAJ |
| description | Service delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized via Automatic Speech Recognition (ASR) based Speech-to-Text (STT) transcription enables progressive analysis of mental health cases. The ASR applications usually require audio recording prior to the transcription, for labeling speakers or diarization. Although such models are good enough for most use cases, storing audio recordings in psychiatry complicates the data handling and adoption of ASR platforms in mental healthcare. This study involved a two-stage methodology, where at first, a list of 32 well-reputed STT transcription tools were evaluated in terms of applicability in psychiatry; followed by experimental testing using nine audio clips derived from three psychiatric session recordings of varying durations (1, 3, and 10 minutes) and speakers’ gender. Metrics such as inference time, Word Error Rate (WER), and Diarization Error Rate (DER) were analyzed. The results indicated that while WER was positively low (0-7%), DER varied significantly (2-32%), influenced by the audio length and speaker characteristics. DER was notably lower for clips with speakers of differing genders or ages, but negatively increased for speakers of similar demographics. The study also compared synchronous and asynchronous diarization approaches, highlighting challenges in accuracy, privacy, and processing efficiency in psychiatry. These findings provide actionable insights for selecting ASR tools in mental healthcare and underscore the need for targeted improvements in ASR technology to address the unique demands of this field. |
| format | Article |
| id | doaj-art-ca2bd7e43b514331ba4c55e309a8f5fa |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-ca2bd7e43b514331ba4c55e309a8f5fa2025-08-20T03:28:59ZengIEEEIEEE Access2169-35362025-01-011311734311735410.1109/ACCESS.2025.358545411063333Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment SessionsRana Zeeshan0https://orcid.org/0009-0003-7384-7253John Bogue1https://orcid.org/0000-0002-7070-1561Mamoona Naveed Asghar2https://orcid.org/0000-0001-7460-266XCollege of Science and Engineering, University of Galway, Galway, IrelandSchool of Psychology, University of Galway, Galway, IrelandCollege of Science and Engineering, University of Galway, Galway, IrelandService delivery in mental healthcare involves documentation of sensitive patient-clinician conversations that require serious caution. Conventionally, clinicians take handwritten notes, which causes low readability and lack of database which hinders research. Having these conversations digitized via Automatic Speech Recognition (ASR) based Speech-to-Text (STT) transcription enables progressive analysis of mental health cases. The ASR applications usually require audio recording prior to the transcription, for labeling speakers or diarization. Although such models are good enough for most use cases, storing audio recordings in psychiatry complicates the data handling and adoption of ASR platforms in mental healthcare. This study involved a two-stage methodology, where at first, a list of 32 well-reputed STT transcription tools were evaluated in terms of applicability in psychiatry; followed by experimental testing using nine audio clips derived from three psychiatric session recordings of varying durations (1, 3, and 10 minutes) and speakers’ gender. Metrics such as inference time, Word Error Rate (WER), and Diarization Error Rate (DER) were analyzed. The results indicated that while WER was positively low (0-7%), DER varied significantly (2-32%), influenced by the audio length and speaker characteristics. DER was notably lower for clips with speakers of differing genders or ages, but negatively increased for speakers of similar demographics. The study also compared synchronous and asynchronous diarization approaches, highlighting challenges in accuracy, privacy, and processing efficiency in psychiatry. These findings provide actionable insights for selecting ASR tools in mental healthcare and underscore the need for targeted improvements in ASR technology to address the unique demands of this field.https://ieeexplore.ieee.org/document/11063333/Automatic speech recognition toolsdata privacymental health servicesspeaker diarization toolspsychiatry case notesspeech to text transcription tools |
| spellingShingle | Rana Zeeshan John Bogue Mamoona Naveed Asghar Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions IEEE Access Automatic speech recognition tools data privacy mental health services speaker diarization tools psychiatry case notes speech to text transcription tools |
| title | Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions |
| title_full | Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions |
| title_fullStr | Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions |
| title_full_unstemmed | Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions |
| title_short | Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions |
| title_sort | relative applicability of diverse automatic speech recognition platforms for transcription of psychiatric treatment sessions |
| topic | Automatic speech recognition tools data privacy mental health services speaker diarization tools psychiatry case notes speech to text transcription tools |
| url | https://ieeexplore.ieee.org/document/11063333/ |
| work_keys_str_mv | AT ranazeeshan relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions AT johnbogue relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions AT mamoonanaveedasghar relativeapplicabilityofdiverseautomaticspeechrecognitionplatformsfortranscriptionofpsychiatrictreatmentsessions |