Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models

Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attac...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lijuan Shi, Yajing Kang, Jie Hu, Xinchi Li, Mingchuan Yang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Prompt injection detection Chain-of-Thought (CoT) large language models (LLMs) natural language processing model safety cybersecurity
Online Access:	https://ieeexplore.ieee.org/document/11053836/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849424283861778432
author	Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang
author_facet	Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang
author_sort	Lijuan Shi
collection	DOAJ
description	Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.
format	Article
id	doaj-art-b4c475fc36c54c1cbf2133b545aa880f
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-b4c475fc36c54c1cbf2133b545aa880f2025-08-20T03:30:14ZengIEEEIEEE Access2169-35362025-01-011311319411320710.1109/ACCESS.2025.358375911053836Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language ModelsLijuan Shi0https://orcid.org/0009-0002-7013-4541Yajing Kang1https://orcid.org/0009-0000-4071-7880Jie Hu2Xinchi Li3Mingchuan Yang4China Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaLarge language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.https://ieeexplore.ieee.org/document/11053836/Prompt injection detectionChain-of-Thought (CoT)large language models (LLMs)natural language processingmodel safetycybersecurity
spellingShingle	Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models IEEE Access Prompt injection detection Chain-of-Thought (CoT) large language models (LLMs) natural language processing model safety cybersecurity
title	Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_full	Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_fullStr	Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_full_unstemmed	Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_short	Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
title_sort	meticulous thought defender fine grained chain of thought cot for detecting prompt injection attacks of large language models
topic	Prompt injection detection Chain-of-Thought (CoT) large language models (LLMs) natural language processing model safety cybersecurity
url	https://ieeexplore.ieee.org/document/11053836/
work_keys_str_mv	AT lijuanshi meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT yajingkang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT jiehu meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT xinchili meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT mingchuanyang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels

Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models

Similar Items