Phish Fighter: Self Updating Machine Learning Shield Against Phishing Kits Based on HTML Code Analysis

Phishing attacks are a growing threat that has evolved along with technological advancements. Existing detection methods struggle with constantly evolving tactics and “zero-day” attacks that exploit unknown vulnerabilities. This paper proposes a novel approach for identifying p...

Full description

Saved in:
Bibliographic Details
Main Authors: Gabriela Brezeanu, Alexandru Archip, Codrut-Georgian Artene
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10824790/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Phishing attacks are a growing threat that has evolved along with technological advancements. Existing detection methods struggle with constantly evolving tactics and “zero-day” attacks that exploit unknown vulnerabilities. This paper proposes a novel approach for identifying phishing web pages built on phishing kits and, consequently, detecting such attack attempts. Our method, Phish Fighter, analyzes the HTML code structure, focusing on recurring blocks across different phishing pages derived from the same source. These features are then fed to clustering and classification components to detect common structural patterns without relying on textual or visual content. This approach overcomes the limitations of existing solutions, and is robust against attacks that target various brands. Furthermore, we implemented a dedicated module for continuous data updates for the Phish Fighter. This module effectively recognizes even “zero-day” phishing attempts by analyzing only three pages associated with a new phishing kit. In addition, we successfully identified the phishing pages created by cloning the original source code of legitimate entities without requiring prior knowledge to distinguish such clones. The results support the efficiency and accuracy of this approach: weighted precision, recall and F1 score are all greater than 90%, and the respective micro-averaged metrics are above 95%.
ISSN:2169-3536