Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment

This paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Prog...

Full description

Saved in:

Bibliographic Details
Main Authors:	Olga Shvetsova, Danila Katalshov, Sang-Kon Lee
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Applied Sciences
Subjects:	generative AI large language models (LLMs) AI safety content filtering AI ethics responsible AI
Online Access:	https://www.mdpi.com/2076-3417/15/13/7298
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849319810601582592
author	Olga Shvetsova Danila Katalshov Sang-Kon Lee
author_facet	Olga Shvetsova Danila Katalshov Sang-Kon Lee
author_sort	Olga Shvetsova
collection	DOAJ
description	This paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Programming Interface (API) Gateway positioned between user applications and LLMs. This gateway architecture is designed to be LLM-agnostic, meaning it can operate with various underlying LLMs without requiring specific modifications for each model. This universality is achieved by standardizing the interface for requests and responses and applying a consistent set of validation and enhancement processes irrespective of the chosen LLM provider, thus offering a consistent governance layer across a diverse LLM ecosystem. AVI facilitates the orchestration of multiple AI subcomponents for input–output validation, response evaluation, and contextual reasoning, thereby enabling real-time, bidirectional filtering of user interactions. A proof-of-concept (PoC) implementation of AVI was developed and rigorously evaluated using industry-standard benchmarks. The system was tested for its effectiveness in mitigating adversarial prompts, reducing toxic outputs, detecting personally identifiable information (PII), and enhancing factual consistency. The results demonstrated that AVI reduced successful fast injection attacks by 82%, decreased toxic content generation by 75%, and achieved high PII detection performance (F1-score ≈ 0.95). Furthermore, the contextual reasoning module significantly improved the neutrality and factual validity of model outputs. Although the integration of AVI introduced a moderate increase in latency, the overall framework effectively enhanced the reliability, safety, and interpretability of LLM-driven applications. AVI provides a scalable and adaptable architectural template for the responsible deployment of generative AI in high-stakes domains such as finance, healthcare, and education, promoting safer and more ethical use of AI technologies.
format	Article
id	doaj-art-e253ce3b0ee24dd68f279bbac42f4733
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-e253ce3b0ee24dd68f279bbac42f47332025-08-20T03:50:20ZengMDPI AGApplied Sciences2076-34172025-06-011513729810.3390/app15137298Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM DeploymentOlga Shvetsova0Danila Katalshov1Sang-Kon Lee2School of Industrial Management, Korea University of Technology and Education (KOREATECH), Cheonan-si 31254, Republic of KoreaSchool of Industrial Management, Korea University of Technology and Education (KOREATECH), Cheonan-si 31254, Republic of KoreaSchool of Industrial Management, Korea University of Technology and Education (KOREATECH), Cheonan-si 31254, Republic of KoreaThis paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Programming Interface (API) Gateway positioned between user applications and LLMs. This gateway architecture is designed to be LLM-agnostic, meaning it can operate with various underlying LLMs without requiring specific modifications for each model. This universality is achieved by standardizing the interface for requests and responses and applying a consistent set of validation and enhancement processes irrespective of the chosen LLM provider, thus offering a consistent governance layer across a diverse LLM ecosystem. AVI facilitates the orchestration of multiple AI subcomponents for input–output validation, response evaluation, and contextual reasoning, thereby enabling real-time, bidirectional filtering of user interactions. A proof-of-concept (PoC) implementation of AVI was developed and rigorously evaluated using industry-standard benchmarks. The system was tested for its effectiveness in mitigating adversarial prompts, reducing toxic outputs, detecting personally identifiable information (PII), and enhancing factual consistency. The results demonstrated that AVI reduced successful fast injection attacks by 82%, decreased toxic content generation by 75%, and achieved high PII detection performance (F1-score ≈ 0.95). Furthermore, the contextual reasoning module significantly improved the neutrality and factual validity of model outputs. Although the integration of AVI introduced a moderate increase in latency, the overall framework effectively enhanced the reliability, safety, and interpretability of LLM-driven applications. AVI provides a scalable and adaptable architectural template for the responsible deployment of generative AI in high-stakes domains such as finance, healthcare, and education, promoting safer and more ethical use of AI technologies.https://www.mdpi.com/2076-3417/15/13/7298generative AIlarge language models (LLMs)AI safetycontent filteringAI ethicsresponsible AI
spellingShingle	Olga Shvetsova Danila Katalshov Sang-Kon Lee Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment Applied Sciences generative AI large language models (LLMs) AI safety content filtering AI ethics responsible AI
title	Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
title_full	Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
title_fullStr	Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
title_full_unstemmed	Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
title_short	Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
title_sort	innovative guardrails for generative ai designing an intelligent filter for safe and responsible llm deployment
topic	generative AI large language models (LLMs) AI safety content filtering AI ethics responsible AI
url	https://www.mdpi.com/2076-3417/15/13/7298
work_keys_str_mv	AT olgashvetsova innovativeguardrailsforgenerativeaidesigninganintelligentfilterforsafeandresponsiblellmdeployment AT danilakatalshov innovativeguardrailsforgenerativeaidesigninganintelligentfilterforsafeandresponsiblellmdeployment AT sangkonlee innovativeguardrailsforgenerativeaidesigninganintelligentfilterforsafeandresponsiblellmdeployment

Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment

Similar Items