LLM-Driven, Self-Improving Framework for Security Test Automation: Leveraging Karate DSL for Augmented API Resilience

Modern software architectures heavily rely on APIs, yet face significant security challenges, particularly with Broken Object Level Authorization (BOLA) vulnerabilities, which remain the most critical API security risk according to OWASP. This paper introduces Karate-BOLA-Guard, an innovative framew...

Full description

Saved in:
Bibliographic Details
Main Authors: Emil Marian Pasca, Daniela Delinschi, Rudolf Erdei, Oliviu Matei
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10942340/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Modern software architectures heavily rely on APIs, yet face significant security challenges, particularly with Broken Object Level Authorization (BOLA) vulnerabilities, which remain the most critical API security risk according to OWASP. This paper introduces Karate-BOLA-Guard, an innovative framework leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques to automate security-focused test case generation for APIs. Our approach integrates vector databases for context retrieval, multiple LLM models for test generation, and observability tools for process monitoring. Initial experiments were carried out on three deliberately vulnerable APIs (VAmPI, Crapi, and OWASP Juice Shop), with subsequent validation on fifteen additional production APIs spanning diverse domains including social media, version control systems, financial services, and transportation services. Our evaluation metrics show Llama 3 8B achieving consistent performance (Accuracy: 3.1-3.4, Interoperability: 3.7-4.3) with an average processing time of 143.76 seconds on GPU. Performance analysis revealed significant GPU acceleration benefits, with 20-25x improvement over CPU processing times. Smaller models demonstrated efficient processing, with Phi-3 Mini averaging 69.58 seconds and Mistral 72.14 seconds, while maintaining acceptable accuracy scores. Token utilization patterns showed Llama 3 8B using an average of 36,591 tokens per session, compared to Mistral’s 25,225 and Phi-3 Mini’s 31,007. Our framework’s effectiveness varied across APIs, with notably strong performance in complex platforms (Instagram: A = 4.3, I = 4.4) while maintaining consistent functionality in simpler implementations (VAmPI: A = 3.6, I = 4.3). The iterative refinement process, evaluated through comprehensive metrics including Accuracy (A), Complexity (C), and Interoperability (I), represents a significant advancement in automated API security testing, offering an efficient, accurate, and adaptable approach to detecting BOLA vulnerabilities across diverse API architectures.
ISSN:2169-3536