Advancing Software Vulnerability Detection with Reasoning LLMs: DeepSeek-R1′s Performance and Insights
The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/12/6651 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The increasing complexity of software systems has heightened the need for efficient and accurate vulnerability detection. Large Language Models have emerged as promising tools in this domain; however, their reasoning capabilities and limitations remain insufficiently explored. This study presents a systematic evaluation of different Large Language Models with and without explicit reasoning mechanisms, including Claude-3.5-Haiku, GPT-4o-Mini, DeepSeek-V3, O3-Mini, and DeepSeek-R1. Experimental results demonstrate that reasoning-enabled models, particularly DeepSeek-R1, outperform their non-reasoning counterparts by leveraging structured step-by-step inference strategies and valuable reasoning traces. With proposed data processing and prompt design in the interaction, DeepSeek-R1 achieves an accuracy of 0.9507 and an F1-score of 0.9659 on the Software Assurance Reference Dataset. These findings highlight the potential of integrating reasoning-enabled Large Language Models into vulnerability detection frameworks to simultaneously improve detection performance and interpretability. |
|---|---|
| ISSN: | 2076-3417 |