Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/12/6651 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849433026883223552 |
|---|---|
| author | Wenting Qin Lijie Suo Liangchen Li Fan Yang |
| author_facet | Wenting Qin Lijie Suo Liangchen Li Fan Yang |
| author_sort | Wenting Qin |
| collection | DOAJ |
| description | The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability. |
| format | Article |
| id | doaj-art-86e8ab12cd824e828dc8fb5f9755caa7 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-86e8ab12cd824e828dc8fb5f9755caa72025-08-20T03:27:11ZengMDPI AGApplied Sciences2076-34172025-06-011512665110.3390/app15126651Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and InsightsWenting Qin0Lijie Suo1Liangchen Li2Fan Yang3School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Mathematical Sciences, Luoyang Normal University, Luoyang 471934, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaThe increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability.https://www.mdpi.com/2076-3417/15/12/6651DeepSeek-R1Large Language Modelreasoning mechanismsstep-by-step inferencevulnerability detection |
| spellingShingle | Wenting Qin Lijie Suo Liangchen Li Fan Yang Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights Applied Sciences DeepSeek-R1 Large Language Model reasoning mechanisms step-by-step inference vulnerability detection |
| title | Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights |
| title_full | Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights |
| title_fullStr | Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights |
| title_full_unstemmed | Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights |
| title_short | Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights |
| title_sort | advancing software vulnerability detection with reasoning llms deepseek r1 s performance and insights |
| topic | DeepSeek-R1 Large Language Model reasoning mechanisms step-by-step inference vulnerability detection |
| url | https://www.mdpi.com/2076-3417/15/12/6651 |
| work_keys_str_mv | AT wentingqin advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights AT lijiesuo advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights AT liangchenli advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights AT fanyang advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights |