Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]

Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This pa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Leena Malkki, Irina van der Vet
Format:	Article
Language:	English
Published:	F1000 Research Ltd 2025-04-01
Series:	Open Research Europe
Subjects:	Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE). eng
Online Access:	https://open-research-europe.ec.europa.eu/articles/5-65/v2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850040416232013824
author	Leena Malkki Irina van der Vet
author_facet	Leena Malkki Irina van der Vet
author_sort	Leena Malkki
collection	DOAJ
description	Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts.
format	Article
id	doaj-art-64019166a919446d8fa0068adf47ceda
institution	DOAJ
issn	2732-5121
language	English
publishDate	2025-04-01
publisher	F1000 Research Ltd
record_format	Article
series	Open Research Europe
spelling	doaj-art-64019166a919446d8fa0068adf47ceda2025-08-20T02:56:06ZengF1000 Research LtdOpen Research Europe2732-51212025-04-01510.12688/openreseurope.19612.221801Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]Leena Malkki0Irina van der Vet1https://orcid.org/0000-0001-8696-1176Centre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandCentre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandBackground The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts.https://open-research-europe.ec.europa.eu/articles/5-65/v2Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE).eng
spellingShingle	Leena Malkki Irina van der Vet Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] Open Research Europe Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE). eng
title	Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_full	Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_fullStr	Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_full_unstemmed	Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_short	Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_sort	copilot in service exploring the potential of the large language model based chatbots for fostering evaluation culture in preventing and countering violent extremism version 2 peer review 2 approved
topic	Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE). eng
url	https://open-research-europe.ec.europa.eu/articles/5-65/v2
work_keys_str_mv	AT leenamalkki copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved AT irinavandervet copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved

Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]

Similar Items