Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method

<b>Background/Objectives</b>: With advancements in Large Language Models (LLMs), counseling chatbots are becoming essential tools for delivering scalable and accessible mental health support. Traditional evaluation scales, however, fail to adequately capture the sophisticated capabilitie...

Full description

Saved in:
Bibliographic Details
Main Authors: Marco Bolpagni, Silvia Gabrielli
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Informatics
Subjects:
Online Access:https://www.mdpi.com/2227-9709/12/1/33
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342667651022848
author Marco Bolpagni
Silvia Gabrielli
author_facet Marco Bolpagni
Silvia Gabrielli
author_sort Marco Bolpagni
collection DOAJ
description <b>Background/Objectives</b>: With advancements in Large Language Models (LLMs), counseling chatbots are becoming essential tools for delivering scalable and accessible mental health support. Traditional evaluation scales, however, fail to adequately capture the sophisticated capabilities of these systems, such as personalized interactions, empathetic responses, and memory retention. This study aims to design a robust and comprehensive evaluation scale, the Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC), using the eDelphi method to address this gap. <b>Methods</b>: A panel of 16 experts in psychology, artificial intelligence, human-computer interaction, and digital therapeutics participated in two iterative eDelphi rounds. The process focused on refining dimensions and items based on qualitative and quantitative feedback. Initial validation, conducted after assembling the final version of the scale, involved 49 participants using the CES-LCC to evaluate an LLM-powered chatbot delivering Self-Help Plus (SH+), an Acceptance and Commitment Therapy-based intervention for stress management. <b>Results</b>: The final version of the CES-LCC features 27 items grouped into nine dimensions: Understanding Requests, Providing Helpful Information, Clarity and Relevance of Responses, Language Quality, Trust, Emotional Support, Guidance and Direction, Memory, and Overall Satisfaction. Initial real-world validation revealed high internal consistency (Cronbach’s alpha = 0.94), although minor adjustments are required for specific dimensions, such as Clarity and Relevance of Responses. <b>Conclusions</b>: The CES-LCC fills a critical gap in the evaluation of LLM-powered counseling chatbots, offering a standardized tool for assessing their multifaceted capabilities. While preliminary results are promising, further research is needed to validate the scale across diverse populations and settings.
format Article
id doaj-art-307f13c4d25646469ab7aa75c0519cfc
institution Kabale University
issn 2227-9709
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Informatics
spelling doaj-art-307f13c4d25646469ab7aa75c0519cfc2025-08-20T03:43:16ZengMDPI AGInformatics2227-97092025-03-011213310.3390/informatics12010033Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi MethodMarco Bolpagni0Silvia Gabrielli1Department of General Psychology, University of Padova, 35121 Padova, ItalyDigital Health Research, Centre for Digital Health and Wellbeing, Fondazione Bruno Kessler, 38123 Trento, Italy<b>Background/Objectives</b>: With advancements in Large Language Models (LLMs), counseling chatbots are becoming essential tools for delivering scalable and accessible mental health support. Traditional evaluation scales, however, fail to adequately capture the sophisticated capabilities of these systems, such as personalized interactions, empathetic responses, and memory retention. This study aims to design a robust and comprehensive evaluation scale, the Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC), using the eDelphi method to address this gap. <b>Methods</b>: A panel of 16 experts in psychology, artificial intelligence, human-computer interaction, and digital therapeutics participated in two iterative eDelphi rounds. The process focused on refining dimensions and items based on qualitative and quantitative feedback. Initial validation, conducted after assembling the final version of the scale, involved 49 participants using the CES-LCC to evaluate an LLM-powered chatbot delivering Self-Help Plus (SH+), an Acceptance and Commitment Therapy-based intervention for stress management. <b>Results</b>: The final version of the CES-LCC features 27 items grouped into nine dimensions: Understanding Requests, Providing Helpful Information, Clarity and Relevance of Responses, Language Quality, Trust, Emotional Support, Guidance and Direction, Memory, and Overall Satisfaction. Initial real-world validation revealed high internal consistency (Cronbach’s alpha = 0.94), although minor adjustments are required for specific dimensions, such as Clarity and Relevance of Responses. <b>Conclusions</b>: The CES-LCC fills a critical gap in the evaluation of LLM-powered counseling chatbots, offering a standardized tool for assessing their multifaceted capabilities. While preliminary results are promising, further research is needed to validate the scale across diverse populations and settings.https://www.mdpi.com/2227-9709/12/1/33counseling chatbotsmental health chatbotsLarge Language Models (LLMs)digital mental healthchatbot evaluationeDelphi methodology
spellingShingle Marco Bolpagni
Silvia Gabrielli
Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
Informatics
counseling chatbots
mental health chatbots
Large Language Models (LLMs)
digital mental health
chatbot evaluation
eDelphi methodology
title Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
title_full Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
title_fullStr Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
title_full_unstemmed Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
title_short Development of a Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC) Using the eDelphi Method
title_sort development of a comprehensive evaluation scale for llm powered counseling chatbots ces lcc using the edelphi method
topic counseling chatbots
mental health chatbots
Large Language Models (LLMs)
digital mental health
chatbot evaluation
eDelphi methodology
url https://www.mdpi.com/2227-9709/12/1/33
work_keys_str_mv AT marcobolpagni developmentofacomprehensiveevaluationscaleforllmpoweredcounselingchatbotsceslccusingtheedelphimethod
AT silviagabrielli developmentofacomprehensiveevaluationscaleforllmpoweredcounselingchatbotsceslccusingtheedelphimethod