An explainable feature selection framework for web phishing detection with machine learning

In the evolving landscape of cyber threats, phishing attacks pose significant challenges, particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy. Conventional and machine learning (ML)-based detection systems struggle to detect phishing websit...

Full description

Saved in:
Bibliographic Details
Main Author: Sakib Shahriar Shafin
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2025-06-01
Series:Data Science and Management
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666764924000419
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849416990869946368
author Sakib Shahriar Shafin
author_facet Sakib Shahriar Shafin
author_sort Sakib Shahriar Shafin
collection DOAJ
description In the evolving landscape of cyber threats, phishing attacks pose significant challenges, particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy. Conventional and machine learning (ML)-based detection systems struggle to detect phishing websites owing to their constantly changing tactics. Furthermore, newer phishing websites exhibit subtle and expertly concealed indicators that are not readily detectable. Hence, effective detection depends on identifying the most critical features. Traditional feature selection (FS) methods often struggle to enhance ML model performance and instead decrease it. To combat these issues, we propose an innovative method using explainable AI (XAI) to enhance FS in ML models and improve the identification of phishing websites. Specifically, we employ SHapley Additive exPlanations (SHAP) for global perspective and aggregated local interpretable model-agnostic explanations (LIME) to determine specific localized patterns. The proposed SHAP and LIME-aggregated FS (SLA-FS) framework pinpoints the most informative features, enabling more precise, swift, and adaptable phishing detection. Applying this approach to an up-to-date web phishing dataset, we evaluate the performance of three ML models before and after FS to assess their effectiveness. Our findings reveal that random forest (RF), with an accuracy of 97.41% and XGBoost (XGB) at 97.21% significantly benefit from the SLA-FS framework, while k-nearest neighbors lags. Our framework increases the accuracy of RF and XGB by 0.65% and 0.41%, respectively, outperforming traditional filter or wrapper methods and any prior methods evaluated on this dataset, showcasing its potential.
format Article
id doaj-art-b5690efb9b164df99e4327e5b37c34fb
institution Kabale University
issn 2666-7649
language English
publishDate 2025-06-01
publisher KeAi Communications Co. Ltd.
record_format Article
series Data Science and Management
spelling doaj-art-b5690efb9b164df99e4327e5b37c34fb2025-08-20T03:32:58ZengKeAi Communications Co. Ltd.Data Science and Management2666-76492025-06-018212713610.1016/j.dsm.2024.08.004An explainable feature selection framework for web phishing detection with machine learningSakib Shahriar Shafin0Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3356, AustraliaIn the evolving landscape of cyber threats, phishing attacks pose significant challenges, particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy. Conventional and machine learning (ML)-based detection systems struggle to detect phishing websites owing to their constantly changing tactics. Furthermore, newer phishing websites exhibit subtle and expertly concealed indicators that are not readily detectable. Hence, effective detection depends on identifying the most critical features. Traditional feature selection (FS) methods often struggle to enhance ML model performance and instead decrease it. To combat these issues, we propose an innovative method using explainable AI (XAI) to enhance FS in ML models and improve the identification of phishing websites. Specifically, we employ SHapley Additive exPlanations (SHAP) for global perspective and aggregated local interpretable model-agnostic explanations (LIME) to determine specific localized patterns. The proposed SHAP and LIME-aggregated FS (SLA-FS) framework pinpoints the most informative features, enabling more precise, swift, and adaptable phishing detection. Applying this approach to an up-to-date web phishing dataset, we evaluate the performance of three ML models before and after FS to assess their effectiveness. Our findings reveal that random forest (RF), with an accuracy of 97.41% and XGBoost (XGB) at 97.21% significantly benefit from the SLA-FS framework, while k-nearest neighbors lags. Our framework increases the accuracy of RF and XGB by 0.65% and 0.41%, respectively, outperforming traditional filter or wrapper methods and any prior methods evaluated on this dataset, showcasing its potential.http://www.sciencedirect.com/science/article/pii/S2666764924000419Webpage phishingExplainable AIFeature selectionMachine learning
spellingShingle Sakib Shahriar Shafin
An explainable feature selection framework for web phishing detection with machine learning
Data Science and Management
Webpage phishing
Explainable AI
Feature selection
Machine learning
title An explainable feature selection framework for web phishing detection with machine learning
title_full An explainable feature selection framework for web phishing detection with machine learning
title_fullStr An explainable feature selection framework for web phishing detection with machine learning
title_full_unstemmed An explainable feature selection framework for web phishing detection with machine learning
title_short An explainable feature selection framework for web phishing detection with machine learning
title_sort explainable feature selection framework for web phishing detection with machine learning
topic Webpage phishing
Explainable AI
Feature selection
Machine learning
url http://www.sciencedirect.com/science/article/pii/S2666764924000419
work_keys_str_mv AT sakibshahriarshafin anexplainablefeatureselectionframeworkforwebphishingdetectionwithmachinelearning
AT sakibshahriarshafin explainablefeatureselectionframeworkforwebphishingdetectionwithmachinelearning