Effective Detection of Malicious Uniform Resource Locator (URLs) Using Deep-Learning Techniques
The rapid growth of internet usage in daily life has led to a significant increase in cyber threats, with malicious URLs serving as a common cybercrime. Traditional detection methods often suffer from high false alarm rates and struggle to keep pace with evolving threats due to outdated feature extr...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Algorithms |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-4893/18/6/355 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The rapid growth of internet usage in daily life has led to a significant increase in cyber threats, with malicious URLs serving as a common cybercrime. Traditional detection methods often suffer from high false alarm rates and struggle to keep pace with evolving threats due to outdated feature extraction techniques and datasets. To address these limitations, we propose a deep learning-based approach aimed at developing an effective model for detecting malicious URLs. Our proposed method, the Char2B model, leverages a fusion of BERT and CharBiGRU embedding, further enhanced by a Conv1D layer with a kernel size of three and unit-sized stride and padding. After combining the embedding, we used the BERT model as a baseline for comparison. The study involved collecting a dataset of 87,216 URLs, comprising both benign and malicious samples sourced from the open project directory (DMOZ), PhishTank, and Any.Run. Models were trained using the training set and evaluated on the test set using standard metrics, including accuracy, precision, recall, and F1-score. Through iterative refinement, we optimized the model’s performance to maximize its effectiveness. As a result, our proposed model achieved 98.50% accuracy, 98.27% precision, 98.69% recall, and a 98.48% F1-score, outperforming the baseline BERT model. Additionally, the false positive rate of our model was 0.017 better than the baseline model’s 0.018. By effectively extracting and utilizing informative features, the model accurately classified URLs into benign and malicious categories, thereby improving detection capabilities. This study highlights the significance of our deep learning approach in strengthening cybersecurity by integrating advanced algorithms that enhance detection accuracy, bolster defense mechanisms, and contribute to a safer digital environment. |
|---|---|
| ISSN: | 1999-4893 |