Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]

Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Leena Malkki, Irina van der Vet
Format: Article
Language:English
Published: F1000 Research Ltd 2025-04-01
Series:Open Research Europe
Subjects:
Online Access:https://open-research-europe.ec.europa.eu/articles/5-65/v2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850040416232013824
author Leena Malkki
Irina van der Vet
author_facet Leena Malkki
Irina van der Vet
author_sort Leena Malkki
collection DOAJ
description Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts.
format Article
id doaj-art-64019166a919446d8fa0068adf47ceda
institution DOAJ
issn 2732-5121
language English
publishDate 2025-04-01
publisher F1000 Research Ltd
record_format Article
series Open Research Europe
spelling doaj-art-64019166a919446d8fa0068adf47ceda2025-08-20T02:56:06ZengF1000 Research LtdOpen Research Europe2732-51212025-04-01510.12688/openreseurope.19612.221801Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]Leena Malkki0Irina van der Vet1https://orcid.org/0000-0001-8696-1176Centre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandCentre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandBackground The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts.https://open-research-europe.ec.europa.eu/articles/5-65/v2Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE).eng
spellingShingle Leena Malkki
Irina van der Vet
Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
Open Research Europe
Artificial intelligence (AI)
large language model (LLM)
recommender system
evidence-based evaluation
evaluation culture
and preventing and countering violent extremism (P/CVE).
eng
title Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_full Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_fullStr Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_full_unstemmed Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_short Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
title_sort copilot in service exploring the potential of the large language model based chatbots for fostering evaluation culture in preventing and countering violent extremism version 2 peer review 2 approved
topic Artificial intelligence (AI)
large language model (LLM)
recommender system
evidence-based evaluation
evaluation culture
and preventing and countering violent extremism (P/CVE).
eng
url https://open-research-europe.ec.europa.eu/articles/5-65/v2
work_keys_str_mv AT leenamalkki copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved
AT irinavandervet copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved