Fast reused code tracing method based on simhash and inverted index
A novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, simila...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Editorial Department of Journal on Communications
2016-11-01
|
| Series: | Tongxin xuebao |
| Subjects: | |
| Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850122383413739520 |
|---|---|
| author | Yan-chen QIAO Xiao-chun YUN Yu-peng TUO Yong-zheng ZHANG |
| author_facet | Yan-chen QIAO Xiao-chun YUN Yu-peng TUO Yong-zheng ZHANG |
| author_sort | Yan-chen QIAO |
| collection | DOAJ |
| description | A novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, similar code blocks could be found quickly according to simhash value of the code block in the function code. Then the potential similar functions could be fast traced using in-verted index. Finally, really similar functions could be identified by comparing jump relationships of similar code blocks. Further, malware samples containing similar functions could be traced. The experimental results show that the method can quickly identify the functions inserted by compilers and the reused functions based on the code database under the premise of high accuracy and recall rate. |
| format | Article |
| id | doaj-art-78cefffa3c4c4b5f8fd98eef39add45d |
| institution | OA Journals |
| issn | 1000-436X |
| language | zho |
| publishDate | 2016-11-01 |
| publisher | Editorial Department of Journal on Communications |
| record_format | Article |
| series | Tongxin xuebao |
| spelling | doaj-art-78cefffa3c4c4b5f8fd98eef39add45d2025-08-20T02:34:50ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2016-11-013710411359704692Fast reused code tracing method based on simhash and inverted indexYan-chen QIAOXiao-chun YUNYu-peng TUOYong-zheng ZHANGA novel method for fast and accurately tracing reused code was proposed. Based on simhash and inverted in-dex, the method can fast trace similar functions in massive code. First of all, a code database with three-level inverted in-dex structures was constructed. For the function to be traced, similar code blocks could be found quickly according to simhash value of the code block in the function code. Then the potential similar functions could be fast traced using in-verted index. Finally, really similar functions could be identified by comparing jump relationships of similar code blocks. Further, malware samples containing similar functions could be traced. The experimental results show that the method can quickly identify the functions inserted by compilers and the reused functions based on the code database under the premise of high accuracy and recall rate.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/network securityreused coderetrieval methodhomology identificationmalware |
| spellingShingle | Yan-chen QIAO Xiao-chun YUN Yu-peng TUO Yong-zheng ZHANG Fast reused code tracing method based on simhash and inverted index Tongxin xuebao network security reused code retrieval method homology identification malware |
| title | Fast reused code tracing method based on simhash and inverted index |
| title_full | Fast reused code tracing method based on simhash and inverted index |
| title_fullStr | Fast reused code tracing method based on simhash and inverted index |
| title_full_unstemmed | Fast reused code tracing method based on simhash and inverted index |
| title_short | Fast reused code tracing method based on simhash and inverted index |
| title_sort | fast reused code tracing method based on simhash and inverted index |
| topic | network security reused code retrieval method homology identification malware |
| url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016225/ |
| work_keys_str_mv | AT yanchenqiao fastreusedcodetracingmethodbasedonsimhashandinvertedindex AT xiaochunyun fastreusedcodetracingmethodbasedonsimhashandinvertedindex AT yupengtuo fastreusedcodetracingmethodbasedonsimhashandinvertedindex AT yongzhengzhang fastreusedcodetracingmethodbasedonsimhashandinvertedindex |