Machine learning in psychiatric health records: A gold standard approach to trauma annotation
Abstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (P...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Publishing Group
2025-08-01
|
| Series: | Translational Psychiatry |
| Online Access: | https://doi.org/10.1038/s41398-025-03487-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235715191209984 |
|---|---|
| author | Eben Holderness Bruce Atwood Marc Verhagen Ann K. Shinn Philip Cawkwell Hudson Cerruti James Pustejovsky Mei-Hua Hall |
| author_facet | Eben Holderness Bruce Atwood Marc Verhagen Ann K. Shinn Philip Cawkwell Hudson Cerruti James Pustejovsky Mei-Hua Hall |
| author_sort | Eben Holderness |
| collection | DOAJ |
| description | Abstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset’s suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications. |
| format | Article |
| id | doaj-art-bffd2f473e4348dfad96f44aa2fb74d6 |
| institution | Kabale University |
| issn | 2158-3188 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Publishing Group |
| record_format | Article |
| series | Translational Psychiatry |
| spelling | doaj-art-bffd2f473e4348dfad96f44aa2fb74d62025-08-20T04:02:42ZengNature Publishing GroupTranslational Psychiatry2158-31882025-08-011511810.1038/s41398-025-03487-0Machine learning in psychiatric health records: A gold standard approach to trauma annotationEben Holderness0Bruce Atwood1Marc Verhagen2Ann K. Shinn3Philip Cawkwell4Hudson Cerruti5James Pustejovsky6Mei-Hua Hall7Psychosis Neurobiology Laboratory, McLean HospitalPsychosis Neurobiology Laboratory, McLean HospitalDepartment of Computer Science, Brandeis UniversitySchizophrenia and Bipolar Disorder Research Program, McLean HospitalPsychosis Neurobiology Laboratory, McLean HospitalUniversity of Rochester School of Medicine and DentistryDepartment of Computer Science, Brandeis UniversityPsychosis Neurobiology Laboratory, McLean HospitalAbstract Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset’s suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.https://doi.org/10.1038/s41398-025-03487-0 |
| spellingShingle | Eben Holderness Bruce Atwood Marc Verhagen Ann K. Shinn Philip Cawkwell Hudson Cerruti James Pustejovsky Mei-Hua Hall Machine learning in psychiatric health records: A gold standard approach to trauma annotation Translational Psychiatry |
| title | Machine learning in psychiatric health records: A gold standard approach to trauma annotation |
| title_full | Machine learning in psychiatric health records: A gold standard approach to trauma annotation |
| title_fullStr | Machine learning in psychiatric health records: A gold standard approach to trauma annotation |
| title_full_unstemmed | Machine learning in psychiatric health records: A gold standard approach to trauma annotation |
| title_short | Machine learning in psychiatric health records: A gold standard approach to trauma annotation |
| title_sort | machine learning in psychiatric health records a gold standard approach to trauma annotation |
| url | https://doi.org/10.1038/s41398-025-03487-0 |
| work_keys_str_mv | AT ebenholderness machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT bruceatwood machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT marcverhagen machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT annkshinn machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT philipcawkwell machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT hudsoncerruti machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT jamespustejovsky machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation AT meihuahall machinelearninginpsychiatrichealthrecordsagoldstandardapproachtotraumaannotation |