Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights

The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenting Qin, Lijie Suo, Liangchen Li, Fan Yang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/12/6651
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849433026883223552
author Wenting Qin
Lijie Suo
Liangchen Li
Fan Yang
author_facet Wenting Qin
Lijie Suo
Liangchen Li
Fan Yang
author_sort Wenting Qin
collection DOAJ
description The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability.
format Article
id doaj-art-86e8ab12cd824e828dc8fb5f9755caa7
institution Kabale University
issn 2076-3417
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-86e8ab12cd824e828dc8fb5f9755caa72025-08-20T03:27:11ZengMDPI AGApplied Sciences2076-34172025-06-011512665110.3390/app15126651Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and InsightsWenting Qin0Lijie Suo1Liangchen Li2Fan Yang3School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaSchool of Mathematical Sciences, Luoyang Normal University, Luoyang 471934, ChinaSchool of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, ChinaThe increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability.https://www.mdpi.com/2076-3417/15/12/6651DeepSeek-R1Large Language Modelreasoning mechanismsstep-by-step inferencevulnerability detection
spellingShingle Wenting Qin
Lijie Suo
Liangchen Li
Fan Yang
Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
Applied Sciences
DeepSeek-R1
Large Language Model
reasoning mechanisms
step-by-step inference
vulnerability detection
title Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
title_full Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
title_fullStr Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
title_full_unstemmed Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
title_short Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
title_sort advancing software vulnerability detection with reasoning llms deepseek r1 s performance and insights
topic DeepSeek-R1
Large Language Model
reasoning mechanisms
step-by-step inference
vulnerability detection
url https://www.mdpi.com/2076-3417/15/12/6651
work_keys_str_mv AT wentingqin advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights
AT lijiesuo advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights
AT liangchenli advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights
AT fanyang advancingsoftwarevulnerabilitydetectionwithreasoningllmsdeepseekr1sperformanceandinsights