Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML

Phishing, a well-known cyber-attack practice has gained significant research attention in the cyber-security domain for the last two decades due to its dynamic attacking strategies. Although different solutions have been exercised against phishing, phishing attacks have dramatically increased in the...

Full description

Saved in:
Bibliographic Details
Main Authors: Subhash Ariyadasa, Shantha Fernando, Subha Fernando
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9848472/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850066361339871232
author Subhash Ariyadasa
Shantha Fernando
Subha Fernando
author_facet Subhash Ariyadasa
Shantha Fernando
Subha Fernando
author_sort Subhash Ariyadasa
collection DOAJ
description Phishing, a well-known cyber-attack practice has gained significant research attention in the cyber-security domain for the last two decades due to its dynamic attacking strategies. Although different solutions have been exercised against phishing, phishing attacks have dramatically increased in the past few years. Recent studies have shown that machine learning has become prominent in the present anti-phishing context, and the techniques like deep learning have extensively improved anti-phishing tools’ detection ability. This paper proposes PhishDet, a new way of detecting phishing websites through Long-term Recurrent Convolutional Network and Graph Convolutional Network using URL and HTML features. PhishDet is the first of its kind, which uses the powerful analysis and processing capabilities of Graph Neural Network in the anti-phishing domain and recorded 96.42% detection accuracy, with a 0.036 false-negative rate. It is effective against zero-day attacks, and the average detection time which is 1.8 seconds could also be considered realistic. The feature selection of PhishDet is automatic and occurs inside the system, as PhishDet gradually learns URLs and HTML content features to handle constantly changing phishing attacks. This has outperformed similar solutions by achieving a 99.53% f1-score with a public benchmark dataset. However, PhishDet requires periodic retraining to maintain its performance over time. If such retraining could be facilitated, PhishDet could fight against phishers for a more extended period to safeguard Internet users from this Internet threat.
format Article
id doaj-art-5cf99e6ae9994cfcbea19a402c178c97
institution DOAJ
issn 2169-3536
language English
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5cf99e6ae9994cfcbea19a402c178c972025-08-20T02:48:46ZengIEEEIEEE Access2169-35362022-01-0110823558237510.1109/ACCESS.2022.31960189848472Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTMLSubhash Ariyadasa0https://orcid.org/0000-0002-7937-128XShantha Fernando1https://orcid.org/0000-0002-4538-4883Subha Fernando2https://orcid.org/0000-0002-2621-5291Department of Computational Mathematics, University of Moratuwa, Moratuwa, Sri LankaDepartment of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri LankaDepartment of Computational Mathematics, University of Moratuwa, Moratuwa, Sri LankaPhishing, a well-known cyber-attack practice has gained significant research attention in the cyber-security domain for the last two decades due to its dynamic attacking strategies. Although different solutions have been exercised against phishing, phishing attacks have dramatically increased in the past few years. Recent studies have shown that machine learning has become prominent in the present anti-phishing context, and the techniques like deep learning have extensively improved anti-phishing tools’ detection ability. This paper proposes PhishDet, a new way of detecting phishing websites through Long-term Recurrent Convolutional Network and Graph Convolutional Network using URL and HTML features. PhishDet is the first of its kind, which uses the powerful analysis and processing capabilities of Graph Neural Network in the anti-phishing domain and recorded 96.42% detection accuracy, with a 0.036 false-negative rate. It is effective against zero-day attacks, and the average detection time which is 1.8 seconds could also be considered realistic. The feature selection of PhishDet is automatic and occurs inside the system, as PhishDet gradually learns URLs and HTML content features to handle constantly changing phishing attacks. This has outperformed similar solutions by achieving a 99.53% f1-score with a public benchmark dataset. However, PhishDet requires periodic retraining to maintain its performance over time. If such retraining could be facilitated, PhishDet could fight against phishers for a more extended period to safeguard Internet users from this Internet threat.https://ieeexplore.ieee.org/document/9848472/Cyberattackdeep learninggraph neural networksinternet security
spellingShingle Subhash Ariyadasa
Shantha Fernando
Subha Fernando
Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
IEEE Access
Cyberattack
deep learning
graph neural networks
internet security
title Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
title_full Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
title_fullStr Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
title_full_unstemmed Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
title_short Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML
title_sort combining long term recurrent convolutional and graph convolutional networks to detect phishing sites using url and html
topic Cyberattack
deep learning
graph neural networks
internet security
url https://ieeexplore.ieee.org/document/9848472/
work_keys_str_mv AT subhashariyadasa combininglongtermrecurrentconvolutionalandgraphconvolutionalnetworkstodetectphishingsitesusingurlandhtml
AT shanthafernando combininglongtermrecurrentconvolutionalandgraphconvolutionalnetworkstodetectphishingsitesusingurlandhtml
AT subhafernando combininglongtermrecurrentconvolutionalandgraphconvolutionalnetworkstodetectphishingsitesusingurlandhtml