Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attac...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11053836/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849424283861778432 |
|---|---|
| author | Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang |
| author_facet | Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang |
| author_sort | Lijuan Shi |
| collection | DOAJ |
| description | Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies. |
| format | Article |
| id | doaj-art-b4c475fc36c54c1cbf2133b545aa880f |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-b4c475fc36c54c1cbf2133b545aa880f2025-08-20T03:30:14ZengIEEEIEEE Access2169-35362025-01-011311319411320710.1109/ACCESS.2025.358375911053836Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language ModelsLijuan Shi0https://orcid.org/0009-0002-7013-4541Yajing Kang1https://orcid.org/0009-0000-4071-7880Jie Hu2Xinchi Li3Mingchuan Yang4China Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaChina Telecom Research Institute, Beijing, ChinaLarge language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose significant security challenges. Traditional detection methods often fail to effectively identify such attacks due to their reliance on static rules or surface-level analysis. In this study, we introduce a novel, fine-grained CoT based detection framework that enhances the interpretability and robustness of attack identification. By dissecting the step-by-step reasoning process of LLMs, our approach leverages multidimensional anomaly detection mechanisms-encompassing semantic analysis, consistency evaluations at both step and path levels, and confidence estimations to uncover subtle disruptions caused by malicious prompt manipulations. Experimental results demonstrate that our method achieves superior performance metrics, with significant improvements in accuracy and F1 score, outperforming traditional method by 1.16% in accuracy and 3.39% in F1 score, and surpassing LLMs’ intrinsic detection capabilities by 6.24% in accuracy and 7.65% in F1 score. This work not only fortifies the security of LLMs’ applications but also provides a foundational framework for future research on adaptive defenses against evolving attack strategies.https://ieeexplore.ieee.org/document/11053836/Prompt injection detectionChain-of-Thought (CoT)large language models (LLMs)natural language processingmodel safetycybersecurity |
| spellingShingle | Lijuan Shi Yajing Kang Jie Hu Xinchi Li Mingchuan Yang Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models IEEE Access Prompt injection detection Chain-of-Thought (CoT) large language models (LLMs) natural language processing model safety cybersecurity |
| title | Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models |
| title_full | Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models |
| title_fullStr | Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models |
| title_full_unstemmed | Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models |
| title_short | Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models |
| title_sort | meticulous thought defender fine grained chain of thought cot for detecting prompt injection attacks of large language models |
| topic | Prompt injection detection Chain-of-Thought (CoT) large language models (LLMs) natural language processing model safety cybersecurity |
| url | https://ieeexplore.ieee.org/document/11053836/ |
| work_keys_str_mv | AT lijuanshi meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT yajingkang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT jiehu meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT xinchili meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels AT mingchuanyang meticulousthoughtdefenderfinegrainedchainofthoughtcotfordetectingpromptinjectionattacksoflargelanguagemodels |