Leveraging large language models for automated depression screening.

Mental health diagnoses possess unique challenges that often lead to nuanced difficulties in managing an individual's well-being and daily functioning. Self-report questionnaires are a common practice in clinical settings to help mitigate the challenges involved in mental health disorder screen...

Full description

Saved in:
Bibliographic Details
Main Authors: Bazen Gashaw Teferra, Argyrios Perivolaris, Wei-Ni Hsiang, Christian Kevin Sidharta, Alice Rueda, Karisa Parkington, Yuqi Wu, Achint Soni, Reza Samavi, Rakesh Jetly, Yanbo Zhang, Bo Cao, Sirisha Rambhatla, Sri Krishnan, Venkat Bhat
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-07-01
Series:PLOS Digital Health
Online Access:https://doi.org/10.1371/journal.pdig.0000943
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849248154189299712
author Bazen Gashaw Teferra
Argyrios Perivolaris
Wei-Ni Hsiang
Christian Kevin Sidharta
Alice Rueda
Karisa Parkington
Yuqi Wu
Achint Soni
Reza Samavi
Rakesh Jetly
Yanbo Zhang
Bo Cao
Sirisha Rambhatla
Sri Krishnan
Venkat Bhat
author_facet Bazen Gashaw Teferra
Argyrios Perivolaris
Wei-Ni Hsiang
Christian Kevin Sidharta
Alice Rueda
Karisa Parkington
Yuqi Wu
Achint Soni
Reza Samavi
Rakesh Jetly
Yanbo Zhang
Bo Cao
Sirisha Rambhatla
Sri Krishnan
Venkat Bhat
author_sort Bazen Gashaw Teferra
collection DOAJ
description Mental health diagnoses possess unique challenges that often lead to nuanced difficulties in managing an individual's well-being and daily functioning. Self-report questionnaires are a common practice in clinical settings to help mitigate the challenges involved in mental health disorder screening. However, these questionnaires rely on an individual's subjective response which can be influenced by various factors. Despite the advancements of Large Language Models (LLMs), quantifying self-reported experiences with natural language processing has resulted in imperfect accuracy. This project aims to demonstrate the effectiveness of zero-shot learning LLMs for screening and assessing item scales for depression using LLMs. The DAIC-WOZ is a publicly available mental health dataset that contains textual data from clinical interviews and self-report questionnaires with relevant mental health disorder labels. The RISEN prompt engineering framework was utilized to evaluate LLMs' effectiveness in predicting depression symptoms based on individual PHQ-8 items. Various LLMs, including GPT models, Llama3_8B, Cohere, and Gemini were assessed based on performance. The GPT models, especially GPT-4o, were consistently better than other LLMs (Llama3_8B, Cohere, Gemini) across all eight items of the PHQ-8 scale in accuracy (M = 75.9%), and F1 score (0.74). GPT models were able to predict PHQ-8 items related to emotional and cognitive states. Llama 3_8B demonstrated superior detection of anhedonia-related symptoms and the Cohere LLM's strength was identifying and predicting psychomotor activity symptoms. This study provides a novel outlook on the potential of LLMs for predicting self-reported questionnaire scores from textual interview data. The promising preliminary performance of the various models indicates there is potential that these models could effectively assist in the screening of depression. Further research is needed to establish a framework for which LLM can be used for specific mental health symptoms and other disorders. As well, analysis of additional datasets while fine-tuning models should be explored.
format Article
id doaj-art-107c99a140eb4eb093776ff2cb4482ee
institution Kabale University
issn 2767-3170
language English
publishDate 2025-07-01
publisher Public Library of Science (PLoS)
record_format Article
series PLOS Digital Health
spelling doaj-art-107c99a140eb4eb093776ff2cb4482ee2025-08-20T03:57:59ZengPublic Library of Science (PLoS)PLOS Digital Health2767-31702025-07-0147e000094310.1371/journal.pdig.0000943Leveraging large language models for automated depression screening.Bazen Gashaw TeferraArgyrios PerivolarisWei-Ni HsiangChristian Kevin SidhartaAlice RuedaKarisa ParkingtonYuqi WuAchint SoniReza SamaviRakesh JetlyYanbo ZhangBo CaoSirisha RambhatlaSri KrishnanVenkat BhatMental health diagnoses possess unique challenges that often lead to nuanced difficulties in managing an individual's well-being and daily functioning. Self-report questionnaires are a common practice in clinical settings to help mitigate the challenges involved in mental health disorder screening. However, these questionnaires rely on an individual's subjective response which can be influenced by various factors. Despite the advancements of Large Language Models (LLMs), quantifying self-reported experiences with natural language processing has resulted in imperfect accuracy. This project aims to demonstrate the effectiveness of zero-shot learning LLMs for screening and assessing item scales for depression using LLMs. The DAIC-WOZ is a publicly available mental health dataset that contains textual data from clinical interviews and self-report questionnaires with relevant mental health disorder labels. The RISEN prompt engineering framework was utilized to evaluate LLMs' effectiveness in predicting depression symptoms based on individual PHQ-8 items. Various LLMs, including GPT models, Llama3_8B, Cohere, and Gemini were assessed based on performance. The GPT models, especially GPT-4o, were consistently better than other LLMs (Llama3_8B, Cohere, Gemini) across all eight items of the PHQ-8 scale in accuracy (M = 75.9%), and F1 score (0.74). GPT models were able to predict PHQ-8 items related to emotional and cognitive states. Llama 3_8B demonstrated superior detection of anhedonia-related symptoms and the Cohere LLM's strength was identifying and predicting psychomotor activity symptoms. This study provides a novel outlook on the potential of LLMs for predicting self-reported questionnaire scores from textual interview data. The promising preliminary performance of the various models indicates there is potential that these models could effectively assist in the screening of depression. Further research is needed to establish a framework for which LLM can be used for specific mental health symptoms and other disorders. As well, analysis of additional datasets while fine-tuning models should be explored.https://doi.org/10.1371/journal.pdig.0000943
spellingShingle Bazen Gashaw Teferra
Argyrios Perivolaris
Wei-Ni Hsiang
Christian Kevin Sidharta
Alice Rueda
Karisa Parkington
Yuqi Wu
Achint Soni
Reza Samavi
Rakesh Jetly
Yanbo Zhang
Bo Cao
Sirisha Rambhatla
Sri Krishnan
Venkat Bhat
Leveraging large language models for automated depression screening.
PLOS Digital Health
title Leveraging large language models for automated depression screening.
title_full Leveraging large language models for automated depression screening.
title_fullStr Leveraging large language models for automated depression screening.
title_full_unstemmed Leveraging large language models for automated depression screening.
title_short Leveraging large language models for automated depression screening.
title_sort leveraging large language models for automated depression screening
url https://doi.org/10.1371/journal.pdig.0000943
work_keys_str_mv AT bazengashawteferra leveraginglargelanguagemodelsforautomateddepressionscreening
AT argyriosperivolaris leveraginglargelanguagemodelsforautomateddepressionscreening
AT weinihsiang leveraginglargelanguagemodelsforautomateddepressionscreening
AT christiankevinsidharta leveraginglargelanguagemodelsforautomateddepressionscreening
AT alicerueda leveraginglargelanguagemodelsforautomateddepressionscreening
AT karisaparkington leveraginglargelanguagemodelsforautomateddepressionscreening
AT yuqiwu leveraginglargelanguagemodelsforautomateddepressionscreening
AT achintsoni leveraginglargelanguagemodelsforautomateddepressionscreening
AT rezasamavi leveraginglargelanguagemodelsforautomateddepressionscreening
AT rakeshjetly leveraginglargelanguagemodelsforautomateddepressionscreening
AT yanbozhang leveraginglargelanguagemodelsforautomateddepressionscreening
AT bocao leveraginglargelanguagemodelsforautomateddepressionscreening
AT sirisharambhatla leveraginglargelanguagemodelsforautomateddepressionscreening
AT srikrishnan leveraginglargelanguagemodelsforautomateddepressionscreening
AT venkatbhat leveraginglargelanguagemodelsforautomateddepressionscreening