PhishingGNN: Phishing Email Detection Using Graph Attention Networks and Transformer-Based Feature Extraction

Phishing emails remain a critical cybersecurity challenge, demanding detection frameworks that capture both textual semantics and structural relationships in email data. This study introduces PhishingGNN, a hybrid model that integrates DistilBERT for context-aware text analysis with Graph Attention...

Full description

Saved in:
Bibliographic Details
Main Authors: Mejdl Safran, Abdulbaset Musleh
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11091285/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Phishing emails remain a critical cybersecurity challenge, demanding detection frameworks that capture both textual semantics and structural relationships in email data. This study introduces PhishingGNN, a hybrid model that integrates DistilBERT for context-aware text analysis with Graph Attention Networks (GAT) to model email metadata and content as graph structures, detecting subtle phishing patterns overlooked by traditional methods. By transforming email bodies into relational graphs, PhishingGNN leverages Graph Neural Networks (GNNs) to analyze textual interactions while retaining computational efficiency. Evaluated on an expanded CEAS_08 dataset (39,154 samples: 17,312 non-phishing and 21,842 phishing emails), PhishingGNN achieves state-of-the-art performance: 0.9939 accuracy, balanced precision, recall, and F1-scores of 0.99, and an AUC of 1.00. Cross-dataset validation on the Nazario Corpus confirms robustness (0.9910 accuracy), outperforming contemporary few-shot learning approaches. PhishingGNN’s key innovations include a transformer-GNN architecture unifying semantic and structural reasoning, a novel graph-based email representation methodology, and comprehensive validation confirming real-world scalability. PhishingGNN advances graph-based deep learning in cybersecurity, offering a modular benchmark solution with demonstrated cross-dataset efficacy.
ISSN:2169-3536