Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights

The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenting Qin, Lijie Suo, Liangchen Li, Fan Yang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/12/6651
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability.
ISSN:2076-3417