Effective Detection of Malicious Uniform Resource Locator (URLs) Using Deep-Learning Techniques

The rapid growth of internet usage in daily life has led to a significant increase in cyber threats, with malicious URLs serving as a common cybercrime. Traditional detection methods often suffer from high false alarm rates and struggle to keep pace with evolving threats due to outdated feature extr...

Full description

Saved in:
Bibliographic Details
Main Authors: Yirga Yayeh Munaye, Aneas Bekele Workneh, Yenework Belayneh Chekol, Atinkut Molla Mekonen
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/6/355
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rapid growth of internet usage in daily life has led to a significant increase in cyber threats, with malicious URLs serving as a common cybercrime. Traditional detection methods often suffer from high false alarm rates and struggle to keep pace with evolving threats due to outdated feature extraction techniques and datasets. To address these limitations, we propose a deep learning-based approach aimed at developing an effective model for detecting malicious URLs. Our proposed method, the Char2B model, leverages a fusion of BERT and CharBiGRU embedding, further enhanced by a Conv1D layer with a kernel size of three and unit-sized stride and padding. After combining the embedding, we used the BERT model as a baseline for comparison. The study involved collecting a dataset of 87,216 URLs, comprising both benign and malicious samples sourced from the open project directory (DMOZ), PhishTank, and Any.Run. Models were trained using the training set and evaluated on the test set using standard metrics, including accuracy, precision, recall, and F1-score. Through iterative refinement, we optimized the model’s performance to maximize its effectiveness. As a result, our proposed model achieved 98.50% accuracy, 98.27% precision, 98.69% recall, and a 98.48% F1-score, outperforming the baseline BERT model. Additionally, the false positive rate of our model was 0.017 better than the baseline model’s 0.018. By effectively extracting and utilizing informative features, the model accurately classified URLs into benign and malicious categories, thereby improving detection capabilities. This study highlights the significance of our deep learning approach in strengthening cybersecurity by integrating advanced algorithms that enhance detection accuracy, bolster defense mechanisms, and contribute to a safer digital environment.
ISSN:1999-4893