Machine learning in psychiatric health records: A gold standard approach to trauma annotation

Abstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (P...

Full description

Saved in:
Bibliographic Details
Main Authors: Eben Holderness, Bruce Atwood, Marc Verhagen, Ann K. Shinn, Philip Cawkwell, Hudson Cerruti, James Pustejovsky, Mei-Hua Hall
Format: Article
Language:English
Published: Nature Publishing Group 2025-08-01
Series:Translational Psychiatry
Online Access:https://doi.org/10.1038/s41398-025-03487-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235715191209984
author Eben Holderness
Bruce Atwood
Marc Verhagen
Ann K. Shinn
Philip Cawkwell
Hudson Cerruti
James Pustejovsky
Mei-Hua Hall
author_facet Eben Holderness
Bruce Atwood
Marc Verhagen
Ann K. Shinn
Philip Cawkwell
Hudson Cerruti
James Pustejovsky
Mei-Hua Hall
author_sort Eben Holderness
collection DOAJ
description Abstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset’s suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.
format Article
id doaj-art-bffd2f473e4348dfad96f44aa2fb74d6
institution Kabale University
issn 2158-3188
language English
publishDate 2025-08-01
publisher Nature Publishing Group
record_format Article
series Translational Psychiatry
spelling doaj-art-bffd2f473e4348dfad96f44aa2fb74d62025-08-20T04:02:42ZengNature Publishing GroupTranslational Psychiatry2158-31882025-08-011511810.1038/s41398-025-03487-0Machine learning in psychiatric health records: A gold standard approach to trauma annotationEben Holderness0Bruce Atwood1Marc Verhagen2Ann K. Shinn3Philip Cawkwell4Hudson Cerruti5James Pustejovsky6Mei-Hua Hall7Psychosis Neurobiology Laboratory, McLean HospitalPsychosis Neurobiology Laboratory, McLean HospitalDepartment of Computer Science, Brandeis UniversitySchizophrenia and Bipolar Disorder Research Program, McLean HospitalPsychosis Neurobiology Laboratory, McLean HospitalUniversity of Rochester School of Medicine and DentistryDepartment of Computer Science, Brandeis UniversityPsychosis Neurobiology Laboratory, McLean HospitalAbstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset’s suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.https://doi.org/10.1038/s41398-025-03487-0
spellingShingle Eben Holderness
Bruce Atwood
Marc Verhagen
Ann K. Shinn
Philip Cawkwell
Hudson Cerruti
James Pustejovsky
Mei-Hua Hall
Machine learning in psychiatric health records: A gold standard approach to trauma annotation
Translational Psychiatry
title Machine learning in psychiatric health records: A gold standard approach to trauma annotation
title_full Machine learning in psychiatric health records: A gold standard approach to trauma annotation
title_fullStr Machine learning in psychiatric health records: A gold standard approach to trauma annotation
title_full_unstemmed Machine learning in psychiatric health records: A gold standard approach to trauma annotation
title_short Machine learning in psychiatric health records: A gold standard approach to trauma annotation
title_sort machine learning in psychiatric health records a gold standard approach to trauma annotation
url https://doi.org/10.1038/s41398-025-03487-0
work_keys_str_mv AT ebenholderness machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT bruceatwood machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT marcverhagen machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT annkshinn machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT philipcawkwell machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT hudsoncerruti machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT jamespustejovsky machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation
AT meihuahall machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation