Fast reused code tracing method based on simhash and inverted index

A novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, simila...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan-chen QIAO, Xiao-chun YUN, Yu-peng TUO, Yong-zheng ZHANG
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2016-11-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850122383413739520
author Yan-chen QIAO
Xiao-chun YUN
Yu-peng TUO
Yong-zheng ZHANG
author_facet Yan-chen QIAO
Xiao-chun YUN
Yu-peng TUO
Yong-zheng ZHANG
author_sort Yan-chen QIAO
collection DOAJ
description A novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, similar code blocks could be found quickly according to simhash value of the code block in the function code. Then the potential similar functions could be fast traced using in-verted index. Finally, really similar functions could be identified by comparing jump relationships of similar code blocks. Further, malware samples containing similar functions could be traced. The experimental results show that the method can quickly identify the functions inserted by compilers and the reused functions based on the code database under the premise of high accuracy and recall rate.
format Article
id doaj-art-78cefffa3c4c4b5f8fd98eef39add45d
institution OA Journals
issn 1000-436X
language zho
publishDate 2016-11-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-78cefffa3c4c4b5f8fd98eef39add45d2025-08-20T02:34:50ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2016-11-013710411359704692Fast reused code tracing method based on simhash and inverted indexYan-chen QIAOXiao-chun YUNYu-peng TUOYong-zheng ZHANGA novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, similar code blocks could be found quickly according to simhash value of the code block in the function code. Then the potential similar functions could be fast traced using in-verted index. Finally, really similar functions could be identified by comparing jump relationships of similar code blocks. Further, malware samples containing similar functions could be traced. The experimental results show that the method can quickly identify the functions inserted by compilers and the reused functions based on the code database under the premise of high accuracy and recall rate.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/network securityreused coderetrieval methodhomology identificationmalware
spellingShingle Yan-chen QIAO
Xiao-chun YUN
Yu-peng TUO
Yong-zheng ZHANG
Fast reused code tracing method based on simhash and inverted index
Tongxin xuebao
network security
reused code
retrieval method
homology identification
malware
title Fast reused code tracing method based on simhash and inverted index
title_full Fast reused code tracing method based on simhash and inverted index
title_fullStr Fast reused code tracing method based on simhash and inverted index
title_full_unstemmed Fast reused code tracing method based on simhash and inverted index
title_short Fast reused code tracing method based on simhash and inverted index
title_sort fast reused code tracing method based on simhash and inverted index
topic network security
reused code
retrieval method
homology identification
malware
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/
work_keys_str_mv AT yanchenqiao fastreusedcodetracingmethodbasedonsimhashandinvertedindex
AT xiaochunyun fastreusedcodetracingmethodbasedonsimhashandinvertedindex
AT yupengtuo fastreusedcodetracingmethodbasedonsimhashandinvertedindex
AT yongzhengzhang fastreusedcodetracingmethodbasedonsimhashandinvertedindex