Multimodal and Temporal Graph Fusion Framework for Advanced Phishing Website Detection
Phishing attacks are among the persistent threats that are dynamically evolving and demand advanced detection mechanisms to counter more sophisticated techniques. Traditional detection approaches are usually based on single-modal features or static analysis, failing to capture the complex, multi-fac...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10976643/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Phishing attacks are among the persistent threats that are dynamically evolving and demand advanced detection mechanisms to counter more sophisticated techniques. Traditional detection approaches are usually based on single-modal features or static analysis, failing to capture the complex, multi-faceted nature of phishing websites and their dynamic behaviors. Thus, we present a robust Multi-Modal and Temporal Graph Fusion Framework integrating advanced learning paradigms that enhance accuracy and adaptability in phishing detection. Our work proposes four brand-new methods: Multi-Modal Hypergraph Fusion Network (MM-HFN), Temporal Graph Neural Network with Attention (TGNN-Att), Federated Graph Contrastive Learning Network (FGCL-Net), and Multi-Modal Temporal Hypergraph Fusion Network (MMTHF-Net). MM-HFN leverages hypergraphs to capture complex, high-order relationships at textual levels (BERT) and graph-based features versus visual ones (CNNs) for an accuracy in the 95-97% range. TGNN-Att addresses temporal variations in phishing behavior by using attention-enhanced temporal graph networks and LSTMs, providing dynamic detection with 94-96% accuracy. FGCL-Net ensures privacy-preserving learning across decentralized datasets through federated contrastive learning, achieving 93-95% accuracy while safeguarding data privacy. Finally, MMTHF-Net fuses multi-modal and temporal features into a dynamic hypergraph framework, achieving state-of-the-art accuracy of 96-98% with an F1-score of 0.97. These approaches together allow for exact, real-time phishing detection by capturing static and temporal behaviors, high-order relationships, and cross-modal features. The framework proposed demonstrates significant improvements compared to the state of the art, eliminating the shortcomings of single-modality and static analysis while offering scalability, privacy, and adaptability levels. |
|---|---|
| ISSN: | 2169-3536 |