Detecting unknown vulnerabilities in smart contracts using opcode sequences

Unknown vulnerabilities, also known as zero-day vulnerabilities, are vulnerabilities in software, systems, or networks that have not yet been publicly disclosed or fixed. If these vulnerabilities are ever discovered by hackers, intentionally or unintentionally, they pose a major threat to network se...

Full description

Saved in:
Bibliographic Details
Main Authors: Peiqiang Li, Guojun Wang, Xiaofei Xing, Xiangbin Li, Jinyao Zhu
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Connection Science
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/09540091.2024.2313853
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850211370793959424
author Peiqiang Li
Guojun Wang
Xiaofei Xing
Xiangbin Li
Jinyao Zhu
author_facet Peiqiang Li
Guojun Wang
Xiaofei Xing
Xiangbin Li
Jinyao Zhu
author_sort Peiqiang Li
collection DOAJ
description Unknown vulnerabilities, also known as zero-day vulnerabilities, are vulnerabilities in software, systems, or networks that have not yet been publicly disclosed or fixed. If these vulnerabilities are ever discovered by hackers, intentionally or unintentionally, they pose a major threat to network security. This is particularly true in the blockchain field, as smart contracts hold a lot of money, and if they are discovered and exploited by hackers, the financial losses to users will be even greater. However, the current research on smart contract vulnerabilities mainly focuses on known vulnerabilities, and the research on unknown vulnerabilities has been limited. Based on this, we introduce a machine learning-based method for detecting unknown vulnerabilities in smart contracts. First, the method obtains the opcode sequences executed by smart contract transactions in the EVM by instrumenting Geth and replaying the Ethereum transactions. Next, we employ an n-gram model and a vector weight penalty mechanism to extract the opcode sequence features. We then use machine learning algorithms to detect unknown vulnerabilities based on the similarity principle. Finally, we test the effectiveness of our method with four machine learning models: the K-Nearest Neighbor algorithm (KNN), Support Vector Machine (SVM), Logistic Regression (LR), and Decision Tree (DT). The SVM model performs best at detecting unknown vulnerabilities, with an accuracy of 96%, a precision of 91%, a recall of 100%, and an F1-score of 95%. We also discuss the benefits of the method: timely detection of attacks due to unknown vulnerabilities, thus reducing user losses.
format Article
id doaj-art-8cf87ea7ad0a419abc7d2fed98d43f83
institution OA Journals
issn 0954-0091
1360-0494
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Connection Science
spelling doaj-art-8cf87ea7ad0a419abc7d2fed98d43f832025-08-20T02:09:34ZengTaylor & Francis GroupConnection Science0954-00911360-04942024-12-0136110.1080/09540091.2024.2313853Detecting unknown vulnerabilities in smart contracts using opcode sequencesPeiqiang Li0Guojun Wang1Xiaofei Xing2Xiangbin Li3Jinyao Zhu4School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, People's Republic of ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, People's Republic of ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, People's Republic of ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, People's Republic of ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, People's Republic of ChinaUnknown vulnerabilities, also known as zero-day vulnerabilities, are vulnerabilities in software, systems, or networks that have not yet been publicly disclosed or fixed. If these vulnerabilities are ever discovered by hackers, intentionally or unintentionally, they pose a major threat to network security. This is particularly true in the blockchain field, as smart contracts hold a lot of money, and if they are discovered and exploited by hackers, the financial losses to users will be even greater. However, the current research on smart contract vulnerabilities mainly focuses on known vulnerabilities, and the research on unknown vulnerabilities has been limited. Based on this, we introduce a machine learning-based method for detecting unknown vulnerabilities in smart contracts. First, the method obtains the opcode sequences executed by smart contract transactions in the EVM by instrumenting Geth and replaying the Ethereum transactions. Next, we employ an n-gram model and a vector weight penalty mechanism to extract the opcode sequence features. We then use machine learning algorithms to detect unknown vulnerabilities based on the similarity principle. Finally, we test the effectiveness of our method with four machine learning models: the K-Nearest Neighbor algorithm (KNN), Support Vector Machine (SVM), Logistic Regression (LR), and Decision Tree (DT). The SVM model performs best at detecting unknown vulnerabilities, with an accuracy of 96%, a precision of 91%, a recall of 100%, and an F1-score of 95%. We also discuss the benefits of the method: timely detection of attacks due to unknown vulnerabilities, thus reducing user losses.https://www.tandfonline.com/doi/10.1080/09540091.2024.2313853Blockchainsmart contractsunknown vulnerabilitiesN-gramopcode sequences
spellingShingle Peiqiang Li
Guojun Wang
Xiaofei Xing
Xiangbin Li
Jinyao Zhu
Detecting unknown vulnerabilities in smart contracts using opcode sequences
Connection Science
Blockchain
smart contracts
unknown vulnerabilities
N-gram
opcode sequences
title Detecting unknown vulnerabilities in smart contracts using opcode sequences
title_full Detecting unknown vulnerabilities in smart contracts using opcode sequences
title_fullStr Detecting unknown vulnerabilities in smart contracts using opcode sequences
title_full_unstemmed Detecting unknown vulnerabilities in smart contracts using opcode sequences
title_short Detecting unknown vulnerabilities in smart contracts using opcode sequences
title_sort detecting unknown vulnerabilities in smart contracts using opcode sequences
topic Blockchain
smart contracts
unknown vulnerabilities
N-gram
opcode sequences
url https://www.tandfonline.com/doi/10.1080/09540091.2024.2313853
work_keys_str_mv AT peiqiangli detectingunknownvulnerabilitiesinsmartcontractsusingopcodesequences
AT guojunwang detectingunknownvulnerabilitiesinsmartcontractsusingopcodesequences
AT xiaofeixing detectingunknownvulnerabilitiesinsmartcontractsusingopcodesequences
AT xiangbinli detectingunknownvulnerabilitiesinsmartcontractsusingopcodesequences
AT jinyaozhu detectingunknownvulnerabilitiesinsmartcontractsusingopcodesequences