Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models

Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attac...

Full description

Saved in:
Bibliographic Details
Main Authors: Lijuan Shi, Yajing Kang, Jie Hu, Xinchi Li, Mingchuan Yang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11053836/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.
ISSN:2169-3536