Evaluating the agreement between ChatGPT-4 and validated questionnaires in screening for anxiety and depression in college students: a cross-sectional study

Abstract Background The Chat Generative Pre-trained Transformer (ChatGPT), an artificial intelligence-based web application, has demonstrated substantial potential across various knowledge domains, particularly in medicine. This cross-sectional study assessed the validity and possible usefulness of...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiali Liu, Juan Gu, Mengjie Tong, Yake Yue, Yufei Qiu, Lijuan Zeng, Yiqing Yu, Fen Yang, Shuyan Zhao
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BMC Psychiatry
Subjects:
Online Access:https://doi.org/10.1186/s12888-025-06798-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background The Chat Generative Pre-trained Transformer (ChatGPT), an artificial intelligence-based web application, has demonstrated substantial potential across various knowledge domains, particularly in medicine. This cross-sectional study assessed the validity and possible usefulness of the ChatGPT-4 in assessing anxiety and depression by comparing two questionnaires. Methods This study tasked ChatGPT-4 with generating a structured interview questionnaire based on the validated Patient Health Questionnaire-9 (PHQ-9) and Generalized Anxiety Disorder Scale-7 (GAD-7). These new measures were referred to as GPT-PHQ-9 and GPT-GAD-7. This study utilized Spearman correlation analysis, Intra-class correlation coefficients (ICC), Youden’s index, receiver operating characteristic (ROC) and Bland–Altman plots to evaluate the consistency between scores from a ChatGPT-4 adapted questionnaire and those from a validated questionnaire. Results A total of 200 college students participated. Cronbach’s α indicated acceptable reliability for both GPT-PHQ-9 (α = 0.75) and GPT-GAD-7 (α = 0.76). ICC values were 0.80 for PHQ-9 and 0.70 for GAD-7. Spearman’s correlation showed moderate associations with PHQ-9 (p = 0.63) and GAD-7 (p = 0.68). ROC curve analysis revealed optimal cutoffs of 9.5 for depressive symptoms and 6.5 for anxiety symptoms, both with high sensitivity and specificity. Conclusions The questionnaire adapted by ChatGPT-4 demonstrated good consistency with the validated questionnaire. Future studies should investigate the usefulness of the ChatGPT designed questionnaire in different populations.
ISSN:1471-244X