Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models

Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attac...

Full description

Saved in:
Bibliographic Details
Main Authors: Lijuan Shi, Yajing Kang, Jie Hu, Xinchi Li, Mingchuan Yang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11053836/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849424283861778432
author Lijuan Shi
Yajing Kang
Jie Hu
Xinchi Li
Mingchuan Yang
author_facet Lijuan Shi
Yajing Kang
Jie Hu
Xinchi Li
Mingchuan Yang
author_sort Lijuan Shi
collection DOAJ
description Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.
format Article
id doaj-art-b4c475fc36c54c1cbf2133b545aa880f
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-b4c475fc36c54c1cbf2133b545aa880f2025-08-20T03:30:14ZengIEEEIEEE Access2169-35362025-01-011311319411320710.1109/ACCESS.2025.358375911053836Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language ModelsLijuan Shi0https://orcid.org/0009-0002-7013-4541Yajing Kang1https://orcid.org/0009-0000-4071-7880Jie Hu2Xinchi Li3Mingchuan Yang4China Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaLarge language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.https://ieeexplore.ieee.org/document/11053836/Prompt injection detectionChain-of-Thought (CoT)large language models (LLMs)natural language processingmodel safetycybersecurity
spellingShingle Lijuan Shi
Yajing Kang
Jie Hu
Xinchi Li
Mingchuan Yang
Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
IEEE Access
Prompt injection detection
Chain-of-Thought (CoT)
large language models (LLMs)
natural language processing
model safety
cybersecurity
title Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_full Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_fullStr Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_full_unstemmed Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_short Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_sort meticulous thought defender fine grained chain of thought cot for detecting prompt injection attacks of large language models
topic Prompt injection detection
Chain-of-Thought (CoT)
large language models (LLMs)
natural language processing
model safety
cybersecurity
url https://ieeexplore.ieee.org/document/11053836/
work_keys_str_mv AT lijuanshi meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels
AT yajingkang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels
AT jiehu meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels
AT xinchili meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels
AT mingchuanyang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels