A Deep Learning Approach for a Source Code Detection Model Using Self-Attention

With the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and...

Full description

Saved in:
Bibliographic Details
Main Authors: Yao Meng, Long Liu
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/5027198
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849397744747151360
author Yao Meng
Long Liu
author_facet Yao Meng
Long Liu
author_sort Yao Meng
collection DOAJ
description With the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and a discriminative model. The representation model firstly transforms the source code into an abstract syntactic tree and splits it into a sequence of statement trees; then, it encodes each of the statement trees with a deep-first traversal algorithm. Finally, the representation model encodes the sequence of statement vectors via a bidirectional LSTM network, which is a classical deep learning framework, with a self-attention layer and outputs a vector representing the given source code. The discriminative model identifies the code clone depending on the vectors generated by the presentation model. Our proposed model retains both the syntactics and semantics of the source code in the process of encoding, and the self-attention algorithm makes the classifier concentrate on the effect of key statements and improves the classification performance. The contrast experiments on the benchmarks OJClone and BigCloneBench indicate that At-LSTM is effective and outperforms the state-of-art approaches in source code clone detection.
format Article
id doaj-art-a8882994dfc040c8be3d2f74a2cc9ca6
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2020-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-a8882994dfc040c8be3d2f74a2cc9ca62025-08-20T03:38:54ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/50271985027198A Deep Learning Approach for a Source Code Detection Model Using Self-AttentionYao Meng0Long Liu1State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaState Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, ChinaWith the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and a discriminative model. The representation model firstly transforms the source code into an abstract syntactic tree and splits it into a sequence of statement trees; then, it encodes each of the statement trees with a deep-first traversal algorithm. Finally, the representation model encodes the sequence of statement vectors via a bidirectional LSTM network, which is a classical deep learning framework, with a self-attention layer and outputs a vector representing the given source code. The discriminative model identifies the code clone depending on the vectors generated by the presentation model. Our proposed model retains both the syntactics and semantics of the source code in the process of encoding, and the self-attention algorithm makes the classifier concentrate on the effect of key statements and improves the classification performance. The contrast experiments on the benchmarks OJClone and BigCloneBench indicate that At-LSTM is effective and outperforms the state-of-art approaches in source code clone detection.http://dx.doi.org/10.1155/2020/5027198
spellingShingle Yao Meng
Long Liu
A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
Complexity
title A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
title_full A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
title_fullStr A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
title_full_unstemmed A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
title_short A Deep Learning Approach for a Source Code Detection Model Using Self-Attention
title_sort deep learning approach for a source code detection model using self attention
url http://dx.doi.org/10.1155/2020/5027198
work_keys_str_mv AT yaomeng adeeplearningapproachforasourcecodedetectionmodelusingselfattention
AT longliu adeeplearningapproachforasourcecodedetectionmodelusingselfattention
AT yaomeng deeplearningapproachforasourcecodedetectionmodelusingselfattention
AT longliu deeplearningapproachforasourcecodedetectionmodelusingselfattention