Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers

This article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucin...

Full description

Saved in:
Bibliographic Details
Main Authors: Antony James, Marcello Trovati, Simon Bolton
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/6247
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722450825183232
author Antony James
Marcello Trovati
Simon Bolton
author_facet Antony James
Marcello Trovati
Simon Bolton
author_sort Antony James
collection DOAJ
description This article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucination tendencies, and it will present RAG as a promising solution that integrates external knowledge retrieval to improve factual accuracy and relevance. This study reviews current RAG architectures, including naïve and advanced models, emphasising techniques such as optimised indexing, query refinement, metadata utilisation, and the incorporation of autonomous AI agents in agentic RAG systems. Methodologies for effective data preprocessing, semantic-aware chunking, and retrieval strategies—such as multihop retrieval and reranking—are also discussed to address challenges such as irrelevant retrieval and semantic fragmentation. This work further examines embedding models, notably the use of state-of-the-art vector representations, to facilitate precise similarity searches within knowledge bases. A case study demonstrates the deployment of an RAG pipeline for analysing multisheet datasets, highlighting challenges in data structuring, prompt engineering, and ensuring output consistency.
format Article
id doaj-art-36744df6ebdf45129f3e8dd4f228eda3
institution DOAJ
issn 2076-3417
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-36744df6ebdf45129f3e8dd4f228eda32025-08-20T03:11:21ZengMDPI AGApplied Sciences2076-34172025-06-011511624710.3390/app15116247Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action DriversAntony James0Marcello Trovati1Simon Bolton2Department of Computer Science, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKDepartment of Computer Science, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKBusiness School, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKThis article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucination tendencies, and it will present RAG as a promising solution that integrates external knowledge retrieval to improve factual accuracy and relevance. This study reviews current RAG architectures, including naïve and advanced models, emphasising techniques such as optimised indexing, query refinement, metadata utilisation, and the incorporation of autonomous AI agents in agentic RAG systems. Methodologies for effective data preprocessing, semantic-aware chunking, and retrieval strategies—such as multihop retrieval and reranking—are also discussed to address challenges such as irrelevant retrieval and semantic fragmentation. This work further examines embedding models, notably the use of state-of-the-art vector representations, to facilitate precise similarity searches within knowledge bases. A case study demonstrates the deployment of an RAG pipeline for analysing multisheet datasets, highlighting challenges in data structuring, prompt engineering, and ensuring output consistency.https://www.mdpi.com/2076-3417/15/11/6247RAGRetrieval-Augmented GenerationLLMlarge language modelsAIartificial intelligence
spellingShingle Antony James
Marcello Trovati
Simon Bolton
Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
Applied Sciences
RAG
Retrieval-Augmented Generation
LLM
large language models
AI
artificial intelligence
title Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
title_full Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
title_fullStr Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
title_full_unstemmed Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
title_short Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
title_sort retrieval augmented generation to generate knowledge assets and creation of action drivers
topic RAG
Retrieval-Augmented Generation
LLM
large language models
AI
artificial intelligence
url https://www.mdpi.com/2076-3417/15/11/6247
work_keys_str_mv AT antonyjames retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers
AT marcellotrovati retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers
AT simonbolton retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers