Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study

The recent advancements and emergence of rapidly evolving AI models, such as large language models (LLMs), have sparked interest among researchers and professionals. These models are ubiquitously being fine-tuned and applied across various fields such as healthcare, customer service and support, edu...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Masum Billah, Harry Setiawan Hamjaya, Hakima Shiralizade, Vandita Singh, Rafia Inam
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/14/7640
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714807367794688
author Md Masum Billah
Harry Setiawan Hamjaya
Hakima Shiralizade
Vandita Singh
Rafia Inam
author_facet Md Masum Billah
Harry Setiawan Hamjaya
Hakima Shiralizade
Vandita Singh
Rafia Inam
author_sort Md Masum Billah
collection DOAJ
description The recent advancements and emergence of rapidly evolving AI models, such as large language models (LLMs), have sparked interest among researchers and professionals. These models are ubiquitously being fine-tuned and applied across various fields such as healthcare, customer service and support, education, automated driving, and smart factories. This often leads to an increased level of complexity and challenges concerning the trustworthiness of these models, such as the generation of toxic content and hallucinations with high confidence leading to serious consequences. The European Union Artificial Intelligence Act (AI Act) is a regulation concerning artificial intelligence. The EU AI Act has proposed a comprehensive set of guidelines to ensure the responsible usage and development of general-purpose AI systems (such as LLMs) that may pose potential risks. The need arises for strengthened efforts to ensure that these high-performing LLMs adhere to the seven trustworthiness aspects (data governance, record-keeping, transparency, human-oversight, accuracy, robustness, and cybersecurity) recommended by the AI Act. Our study systematically maps research, focusing on identifying the key trends in developing LLMs across different application domains to address the aspects of AI Act-based trustworthiness. Our study reveals the recent trends that indicate a growing interest in emerging models such as LLaMa and BARD, reflecting a shift in research priorities. GPT and BERT remain the most studied models, and newer alternatives like Mistral and Claude remain underexplored. Trustworthiness aspects like accuracy and transparency dominate the research landscape, while cybersecurity and record-keeping remain significantly underexamined. Our findings highlight the urgent need for a more balanced, interdisciplinary research approach to ensure LLM trustworthiness across diverse applications. Expanding studies into underexplored, high-risk domains and fostering cross-sector collaboration can bridge existing gaps. Furthermore, this study also reveals domains (like telecommunication) which are underrepresented, presenting considerable research gaps and indicating a potential direction for the way forward.
format Article
id doaj-art-475d71b7d2ce4ef59da9ee0abc35709a
institution DOAJ
issn 2076-3417
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-475d71b7d2ce4ef59da9ee0abc35709a2025-08-20T03:13:36ZengMDPI AGApplied Sciences2076-34172025-07-011514764010.3390/app15147640Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping StudyMd Masum Billah0Harry Setiawan Hamjaya1Hakima Shiralizade2Vandita Singh3Rafia Inam4Ericsson Research, Trustworthy AI, 16483 Stockholm, SwedenEricsson Research, Trustworthy AI, 16483 Stockholm, SwedenEricsson Research, Trustworthy AI, 16483 Stockholm, SwedenEricsson Research, Trustworthy AI, 16483 Stockholm, SwedenEricsson Research, Trustworthy AI, 16483 Stockholm, SwedenThe recent advancements and emergence of rapidly evolving AI models, such as large language models (LLMs), have sparked interest among researchers and professionals. These models are ubiquitously being fine-tuned and applied across various fields such as healthcare, customer service and support, education, automated driving, and smart factories. This often leads to an increased level of complexity and challenges concerning the trustworthiness of these models, such as the generation of toxic content and hallucinations with high confidence leading to serious consequences. The European Union Artificial Intelligence Act (AI Act) is a regulation concerning artificial intelligence. The EU AI Act has proposed a comprehensive set of guidelines to ensure the responsible usage and development of general-purpose AI systems (such as LLMs) that may pose potential risks. The need arises for strengthened efforts to ensure that these high-performing LLMs adhere to the seven trustworthiness aspects (data governance, record-keeping, transparency, human-oversight, accuracy, robustness, and cybersecurity) recommended by the AI Act. Our study systematically maps research, focusing on identifying the key trends in developing LLMs across different application domains to address the aspects of AI Act-based trustworthiness. Our study reveals the recent trends that indicate a growing interest in emerging models such as LLaMa and BARD, reflecting a shift in research priorities. GPT and BERT remain the most studied models, and newer alternatives like Mistral and Claude remain underexplored. Trustworthiness aspects like accuracy and transparency dominate the research landscape, while cybersecurity and record-keeping remain significantly underexamined. Our findings highlight the urgent need for a more balanced, interdisciplinary research approach to ensure LLM trustworthiness across diverse applications. Expanding studies into underexplored, high-risk domains and fostering cross-sector collaboration can bridge existing gaps. Furthermore, this study also reveals domains (like telecommunication) which are underrepresented, presenting considerable research gaps and indicating a potential direction for the way forward.https://www.mdpi.com/2076-3417/15/14/7640large language models (LLMs)trustworthinessEU AI ActGPTBERTLLaMa
spellingShingle Md Masum Billah
Harry Setiawan Hamjaya
Hakima Shiralizade
Vandita Singh
Rafia Inam
Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
Applied Sciences
large language models (LLMs)
trustworthiness
EU AI Act
GPT
BERT
LLaMa
title Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
title_full Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
title_fullStr Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
title_full_unstemmed Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
title_short Large Language Models’ Trustworthiness in the Light of the EU AI Act—A Systematic Mapping Study
title_sort large language models trustworthiness in the light of the eu ai act a systematic mapping study
topic large language models (LLMs)
trustworthiness
EU AI Act
GPT
BERT
LLaMa
url https://www.mdpi.com/2076-3417/15/14/7640
work_keys_str_mv AT mdmasumbillah largelanguagemodelstrustworthinessinthelightoftheeuaiactasystematicmappingstudy
AT harrysetiawanhamjaya largelanguagemodelstrustworthinessinthelightoftheeuaiactasystematicmappingstudy
AT hakimashiralizade largelanguagemodelstrustworthinessinthelightoftheeuaiactasystematicmappingstudy
AT vanditasingh largelanguagemodelstrustworthinessinthelightoftheeuaiactasystematicmappingstudy
AT rafiainam largelanguagemodelstrustworthinessinthelightoftheeuaiactasystematicmappingstudy