Agentic AI Systems: Architecture and Evaluation Using a Frictionless Parking Scenario

An AI agent is a goal-oriented autonomous computational entity that connects reasoning and action using large language models (LLMs) or Large Reasoning Models (LRMs), memory systems, and external tools to achieve contextually intelligent outcomes. An agentic AI system comprises and coordinates multi...

Full description

Saved in:
Bibliographic Details
Main Author: Alaa Khamis
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11083588/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An AI agent is a goal-oriented autonomous computational entity that connects reasoning and action using large language models (LLMs) or Large Reasoning Models (LRMs), memory systems, and external tools to achieve contextually intelligent outcomes. An agentic AI system comprises and coordinates multiple specialized AI agents to achieve complex goals with minimal or no human supervision. In this paper, agentic AI systems are elucidated through a frictionless-parking scenario that illustrates core components of the system, namely the design of individual AI agents, their interaction mechanisms (handoff and cueing), and potential cooperation patterns (augmentative, integrative, and debative). This scenario provides an experimental test-bed for demonstrating how agentic AI can deliver context-aware, personalized services. A six-factor factorial experiment evaluates performance of the implemented AI agents across five user profiles, four GPT backbones, three entropy levels, three verbosity settings, three query complexities, and three prompt specificity levels. To guarantee that each agent’s recommendation is both plausible and constraint-compliant, the system combines guardrails, which reject or rewrite answers that violate user requirements, with Chain-of-Thought prompting that exposes intermediate reasoning steps for internal self-checks. Key metrics (agent’s response time or latency and lexical consistency) show that a lightweight gpt-4o-mini backbone and concise verbosity minimize latency, while medium prompt specificity and moderate query complexity optimize consistency. Decoding entropy influences stylistic diversity without significant latency costs but reduces consistency at high settings. User intent, particularly for creative or ambiguous profiles, drives variability. A SHAP analysis ranks model size, verbosity, and prompt specificity as top performance drivers.
ISSN:2169-3536