Evaluating the agreement between ChatGPT-4 and validated questionnaires in screening for anxiety and depression in college students: a cross-sectional study
Abstract Background The Chat Generative Pre-trained Transformer (ChatGPT), an artificial intelligence-based web application, has demonstrated substantial potential across various knowledge domains, particularly in medicine. This cross-sectional study assessed the validity and possible usefulness of...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | BMC Psychiatry |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12888-025-06798-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background The Chat Generative Pre-trained Transformer (ChatGPT), an artificial intelligence-based web application, has demonstrated substantial potential across various knowledge domains, particularly in medicine. This cross-sectional study assessed the validity and possible usefulness of the ChatGPT-4 in assessing anxiety and depression by comparing two questionnaires. Methods This study tasked ChatGPT-4 with generating a structured interview questionnaire based on the validated Patient Health Questionnaire-9 (PHQ-9) and Generalized Anxiety Disorder Scale-7 (GAD-7). These new measures were referred to as GPT-PHQ-9 and GPT-GAD-7. This study utilized Spearman correlation analysis, Intra-class correlation coefficients (ICC), Youden’s index, receiver operating characteristic (ROC) and Bland–Altman plots to evaluate the consistency between scores from a ChatGPT-4 adapted questionnaire and those from a validated questionnaire. Results A total of 200 college students participated. Cronbach’s α indicated acceptable reliability for both GPT-PHQ-9 (α = 0.75) and GPT-GAD-7 (α = 0.76). ICC values were 0.80 for PHQ-9 and 0.70 for GAD-7. Spearman’s correlation showed moderate associations with PHQ-9 (p = 0.63) and GAD-7 (p = 0.68). ROC curve analysis revealed optimal cutoffs of 9.5 for depressive symptoms and 6.5 for anxiety symptoms, both with high sensitivity and specificity. Conclusions The questionnaire adapted by ChatGPT-4 demonstrated good consistency with the validated questionnaire. Future studies should investigate the usefulness of the ChatGPT designed questionnaire in different populations. |
|---|---|
| ISSN: | 1471-244X |