Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study

Abstract BackgroundDespite potential risks, artificial intelligence–based chatbots that simulate psychotherapy are becoming more widely available and frequently used by the general public. A comprehensive way of evaluating the quality of these chatbots is needed. O...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kunmi Sobowale, Daniel Kevin Humphrey
Format:	Article
Language:	English
Published:	JMIR Publications 2025-07-01
Series:	JMIR Formative Research
Online Access:	https://formative.jmir.org/2025/1/e65605
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850100131059204096
author	Kunmi Sobowale Daniel Kevin Humphrey
author_facet	Kunmi Sobowale Daniel Kevin Humphrey
author_sort	Kunmi Sobowale
collection	DOAJ
description	Abstract BackgroundDespite potential risks, artificial intelligence–based chatbots that simulate psychotherapy are becoming more widely available and frequently used by the general public. A comprehensive way of evaluating the quality of these chatbots is needed. ObjectiveTo address this need, we developed the CAPE (Conversational Agent for Psychotherapy Evaluation) framework to aid clinicians, researchers, and lay users in assessing psychotherapy chatbot quality. We use the framework to evaluate and compare the quality of popular artificial intelligence psychotherapy chatbots on the OpenAI GPT store. MethodsWe identified 4 popular chatbots on OpenAI’s GPT store. Two reviewers independently applied the CAPE framework to these chatbots, using 2 fictional personas to simulate interactions. The modular framework has 8 sections, each yielding an independent quality subscore between 0 and 1. We used t ResultsChatbots consistently scored highly on the sections of background information (subscores=0.83-1), conversational capabilities (subscores=0.83-1), therapeutic alliance, and boundaries (subscores=0.75-1), and accessibility (subscores=0.8-0.95). Scores were low for the therapeutic orientation (subscores=0) and monitoring and risk evaluation sections (subscores=0.67-0.75). Information on training data and knowledge base sections was not transparent (subscores=0). Except for the privacy and harm section (mean 0.017, SD 0.00; t3P ConclusionsThe CAPE framework offers a robust and reliable method for assessing the quality of psychotherapy chatbots, enabling users to make informed choices based on their specific needs and preferences. Our evaluation revealed that while the popular chatbots on OpenAI’s GPT store were effective at developing rapport and were easily accessible, they failed to address essential safety and privacy functions adequately.
format	Article
id	doaj-art-c1fb6266d397480aaafcbb4a1895e832
institution	DOAJ
issn	2561-326X
language	English
publishDate	2025-07-01
publisher	JMIR Publications
record_format	Article
series	JMIR Formative Research
spelling	doaj-art-c1fb6266d397480aaafcbb4a1895e8322025-08-20T02:40:21ZengJMIR PublicationsJMIR Formative Research2561-326X2025-07-019e65605e6560510.2196/65605Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional StudyKunmi Sobowalehttp://orcid.org/0000-0002-3489-7114Daniel Kevin Humphreyhttp://orcid.org/0009-0005-3530-4279 Abstract BackgroundDespite potential risks, artificial intelligence–based chatbots that simulate psychotherapy are becoming more widely available and frequently used by the general public. A comprehensive way of evaluating the quality of these chatbots is needed. ObjectiveTo address this need, we developed the CAPE (Conversational Agent for Psychotherapy Evaluation) framework to aid clinicians, researchers, and lay users in assessing psychotherapy chatbot quality. We use the framework to evaluate and compare the quality of popular artificial intelligence psychotherapy chatbots on the OpenAI GPT store. MethodsWe identified 4 popular chatbots on OpenAI’s GPT store. Two reviewers independently applied the CAPE framework to these chatbots, using 2 fictional personas to simulate interactions. The modular framework has 8 sections, each yielding an independent quality subscore between 0 and 1. We used t ResultsChatbots consistently scored highly on the sections of background information (subscores=0.83-1), conversational capabilities (subscores=0.83-1), therapeutic alliance, and boundaries (subscores=0.75-1), and accessibility (subscores=0.8-0.95). Scores were low for the therapeutic orientation (subscores=0) and monitoring and risk evaluation sections (subscores=0.67-0.75). Information on training data and knowledge base sections was not transparent (subscores=0). Except for the privacy and harm section (mean 0.017, SD 0.00; t3P ConclusionsThe CAPE framework offers a robust and reliable method for assessing the quality of psychotherapy chatbots, enabling users to make informed choices based on their specific needs and preferences. Our evaluation revealed that while the popular chatbots on OpenAI’s GPT store were effective at developing rapport and were easily accessible, they failed to address essential safety and privacy functions adequately.https://formative.jmir.org/2025/1/e65605
spellingShingle	Kunmi Sobowale Daniel Kevin Humphrey Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study JMIR Formative Research
title	Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study
title_full	Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study
title_fullStr	Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study
title_full_unstemmed	Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study
title_short	Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study
title_sort	evaluating the quality of psychotherapy conversational agents framework development and cross sectional study
url	https://formative.jmir.org/2025/1/e65605
work_keys_str_mv	AT kunmisobowale evaluatingthequalityofpsychotherapyconversationalagentsframeworkdevelopmentandcrosssectionalstudy AT danielkevinhumphrey evaluatingthequalityofpsychotherapyconversationalagentsframeworkdevelopmentandcrosssectionalstudy

Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study

Similar Items