Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]
Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This pa...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
F1000 Research Ltd
2025-04-01
|
| Series: | Open Research Europe |
| Subjects: | |
| Online Access: | https://open-research-europe.ec.europa.eu/articles/5-65/v2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850040416232013824 |
|---|---|
| author | Leena Malkki Irina van der Vet |
| author_facet | Leena Malkki Irina van der Vet |
| author_sort | Leena Malkki |
| collection | DOAJ |
| description | Background The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts. |
| format | Article |
| id | doaj-art-64019166a919446d8fa0068adf47ceda |
| institution | DOAJ |
| issn | 2732-5121 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | F1000 Research Ltd |
| record_format | Article |
| series | Open Research Europe |
| spelling | doaj-art-64019166a919446d8fa0068adf47ceda2025-08-20T02:56:06ZengF1000 Research LtdOpen Research Europe2732-51212025-04-01510.12688/openreseurope.19612.221801Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved]Leena Malkki0Irina van der Vet1https://orcid.org/0000-0001-8696-1176Centre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandCentre for European Studies, University of Helsinki Faculty of Social Sciences, Helsinki, Uusimaa, 00014, FinlandBackground The rapid advancement in artificial intelligence (AI) technology has introduced the large language model (LLM)-based assistants or chatbots. To fully unlock the potential of this technology for the preventing and countering violent extremism (P/CVE) field, more research is needed. This paper examines the feasibility of using chatbots as recommender systems to respond to practitioners’ needs in evaluation, increase their knowledge about the key evaluation aspects, and provide practical guidance and professional support for the evaluation process. At the same time, the paper provides an overview of the limitations that such solution entails. Methods To explore the performance of the LLM-based chatbots we chose a publicly available AI assistant called Copilot as an example. We conducted a qualitative analysis of its responses to 50 pre-designed prompts of various types. The study was driven by the analysis questions established to explore accuracy and reliability, relevance and integrity, as well as readability and comprehensiveness of the responses. We derived the key aspects of evidence-based evaluation along with practitioners’ needs from the results of the H2020 INDEED project. Results Our findings indicate that Copilot demonstrated significant proficiency in addressing issues related to evidence-based evaluation in P/CVE. Most generated responses were factually accurate, relevant, and structurally sound, i.e. sufficient to kick-start and deepen internal evidence-based practise. At the same time, biases and data security issues inherent in LLM-based chatbots should be carefully explored by practitioners. Conclusions This study underscored both the potential and limitations of LLM-based chatbots in fostering evaluation culture in P/CVE. While Copilot can effectively generate accessible, informative and encouraging recommendations, it still requires a professional oversight to manage and coordinate the evaluation process, as well as address more field-specific needs. The future research should focus on more rigorous and user-centred assessment of such systems for P/CVE use based on multidisciplinary efforts.https://open-research-europe.ec.europa.eu/articles/5-65/v2Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE).eng |
| spellingShingle | Leena Malkki Irina van der Vet Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] Open Research Europe Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE). eng |
| title | Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] |
| title_full | Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] |
| title_fullStr | Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] |
| title_full_unstemmed | Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] |
| title_short | Copilot in service: Exploring the potential of the large language model-based chatbots for fostering evaluation culture in preventing and countering violent extremism [version 2; peer review: 2 approved] |
| title_sort | copilot in service exploring the potential of the large language model based chatbots for fostering evaluation culture in preventing and countering violent extremism version 2 peer review 2 approved |
| topic | Artificial intelligence (AI) large language model (LLM) recommender system evidence-based evaluation evaluation culture and preventing and countering violent extremism (P/CVE). eng |
| url | https://open-research-europe.ec.europa.eu/articles/5-65/v2 |
| work_keys_str_mv | AT leenamalkki copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved AT irinavandervet copilotinserviceexploringthepotentialofthelargelanguagemodelbasedchatbotsforfosteringevaluationcultureinpreventingandcounteringviolentextremismversion2peerreview2approved |