Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers
This article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucin...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/11/6247 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849722450825183232 |
|---|---|
| author | Antony James Marcello Trovati Simon Bolton |
| author_facet | Antony James Marcello Trovati Simon Bolton |
| author_sort | Antony James |
| collection | DOAJ |
| description | This article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucination tendencies, and it will present RAG as a promising solution that integrates external knowledge retrieval to improve factual accuracy and relevance. This study reviews current RAG architectures, including naïve and advanced models, emphasising techniques such as optimised indexing, query refinement, metadata utilisation, and the incorporation of autonomous AI agents in agentic RAG systems. Methodologies for effective data preprocessing, semantic-aware chunking, and retrieval strategies—such as multihop retrieval and reranking—are also discussed to address challenges such as irrelevant retrieval and semantic fragmentation. This work further examines embedding models, notably the use of state-of-the-art vector representations, to facilitate precise similarity searches within knowledge bases. A case study demonstrates the deployment of an RAG pipeline for analysing multisheet datasets, highlighting challenges in data structuring, prompt engineering, and ensuring output consistency. |
| format | Article |
| id | doaj-art-36744df6ebdf45129f3e8dd4f228eda3 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-36744df6ebdf45129f3e8dd4f228eda32025-08-20T03:11:21ZengMDPI AGApplied Sciences2076-34172025-06-011511624710.3390/app15116247Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action DriversAntony James0Marcello Trovati1Simon Bolton2Department of Computer Science, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKDepartment of Computer Science, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKBusiness School, Edge Hill University, St Helens Road, Ormskirk L39 4QP, Lancashire, UKThis article explores the application of Retrieval-Augmented Generation (RAG) to enhance the creation of knowledge assets and develop actionable insights from complex datasets. It begins by contextualising the limitations of large language models (LLMs), notably their knowledge cut-offs and hallucination tendencies, and it will present RAG as a promising solution that integrates external knowledge retrieval to improve factual accuracy and relevance. This study reviews current RAG architectures, including naïve and advanced models, emphasising techniques such as optimised indexing, query refinement, metadata utilisation, and the incorporation of autonomous AI agents in agentic RAG systems. Methodologies for effective data preprocessing, semantic-aware chunking, and retrieval strategies—such as multihop retrieval and reranking—are also discussed to address challenges such as irrelevant retrieval and semantic fragmentation. This work further examines embedding models, notably the use of state-of-the-art vector representations, to facilitate precise similarity searches within knowledge bases. A case study demonstrates the deployment of an RAG pipeline for analysing multisheet datasets, highlighting challenges in data structuring, prompt engineering, and ensuring output consistency.https://www.mdpi.com/2076-3417/15/11/6247RAGRetrieval-Augmented GenerationLLMlarge language modelsAIartificial intelligence |
| spellingShingle | Antony James Marcello Trovati Simon Bolton Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers Applied Sciences RAG Retrieval-Augmented Generation LLM large language models AI artificial intelligence |
| title | Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers |
| title_full | Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers |
| title_fullStr | Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers |
| title_full_unstemmed | Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers |
| title_short | Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers |
| title_sort | retrieval augmented generation to generate knowledge assets and creation of action drivers |
| topic | RAG Retrieval-Augmented Generation LLM large language models AI artificial intelligence |
| url | https://www.mdpi.com/2076-3417/15/11/6247 |
| work_keys_str_mv | AT antonyjames retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers AT marcellotrovati retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers AT simonbolton retrievalaugmentedgenerationtogenerateknowledgeassetsandcreationofactiondrivers |