LLM-Based Cyberattack Detection Using Network Flow Statistics
Cybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design ne...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/12/6529 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850156297573367808 |
|---|---|
| author | Leopoldo Gutiérrez-Galeano Juan-José Domínguez-Jiménez Jörg Schäfer Inmaculada Medina-Bulo |
| author_facet | Leopoldo Gutiérrez-Galeano Juan-José Domínguez-Jiménez Jörg Schäfer Inmaculada Medina-Bulo |
| author_sort | Leopoldo Gutiérrez-Galeano |
| collection | DOAJ |
| description | Cybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design new strategies to deal with new types of cyberattacks as they arise. Existing tools that harness artificial intelligence techniques mainly use artificial neural networks designed from scratch. In this paper, we present a novel approach for cyberattack detection using an encoder–decoder pre-trained Large Language Model (T5), fine-tuned to adapt its classification scheme for the detection of cyberattacks. Our system is anomaly-based and takes statistics of already finished network flows as input. This work makes significant contributions by introducing a novel methodology for adapting its original task from natural language processing to cybersecurity, achieved by transforming numerical network flow features into a unique abstract artificial language for the model input. We validated the robustness of our detection system across three datasets using undersampling. Our model achieved consistently high performance across all evaluated datasets. Specifically, for the CIC-IDS-2017 dataset, we obtained an accuracy, precision, recall, and F-score of more than 99.94%. For CSE-CIC-IDS-2018, these metrics exceeded 99.84%, and for BCCC-CIC-IDS-2017, they were all above 99.90%. These results collectively demonstrate superior performance for cyberattack detection, while maintaining highly competitive false-positive rates and false-negative rates. This efficacy is achieved by relying exclusively on real-world network flow statistics, without the need for synthetic data generation. |
| format | Article |
| id | doaj-art-c1befc085d9c4300b2d74e02e3408576 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-c1befc085d9c4300b2d74e02e34085762025-08-20T02:24:35ZengMDPI AGApplied Sciences2076-34172025-06-011512652910.3390/app15126529LLM-Based Cyberattack Detection Using Network Flow StatisticsLeopoldo Gutiérrez-Galeano0Juan-José Domínguez-Jiménez1Jörg Schäfer2Inmaculada Medina-Bulo3Escuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainEscuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainFaculty of Computer Science and Engineering, Frankfurt University of Applied Sciences, Nibelungenplatz 1, 60318 Frankfurt am Main, GermanyEscuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainCybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design new strategies to deal with new types of cyberattacks as they arise. Existing tools that harness artificial intelligence techniques mainly use artificial neural networks designed from scratch. In this paper, we present a novel approach for cyberattack detection using an encoder–decoder pre-trained Large Language Model (T5), fine-tuned to adapt its classification scheme for the detection of cyberattacks. Our system is anomaly-based and takes statistics of already finished network flows as input. This work makes significant contributions by introducing a novel methodology for adapting its original task from natural language processing to cybersecurity, achieved by transforming numerical network flow features into a unique abstract artificial language for the model input. We validated the robustness of our detection system across three datasets using undersampling. Our model achieved consistently high performance across all evaluated datasets. Specifically, for the CIC-IDS-2017 dataset, we obtained an accuracy, precision, recall, and F-score of more than 99.94%. For CSE-CIC-IDS-2018, these metrics exceeded 99.84%, and for BCCC-CIC-IDS-2017, they were all above 99.90%. These results collectively demonstrate superior performance for cyberattack detection, while maintaining highly competitive false-positive rates and false-negative rates. This efficacy is achieved by relying exclusively on real-world network flow statistics, without the need for synthetic data generation.https://www.mdpi.com/2076-3417/15/12/6529large language modelmachine learningfine-tuningdeep learningcybersecuritynetwork security |
| spellingShingle | Leopoldo Gutiérrez-Galeano Juan-José Domínguez-Jiménez Jörg Schäfer Inmaculada Medina-Bulo LLM-Based Cyberattack Detection Using Network Flow Statistics Applied Sciences large language model machine learning fine-tuning deep learning cybersecurity network security |
| title | LLM-Based Cyberattack Detection Using Network Flow Statistics |
| title_full | LLM-Based Cyberattack Detection Using Network Flow Statistics |
| title_fullStr | LLM-Based Cyberattack Detection Using Network Flow Statistics |
| title_full_unstemmed | LLM-Based Cyberattack Detection Using Network Flow Statistics |
| title_short | LLM-Based Cyberattack Detection Using Network Flow Statistics |
| title_sort | llm based cyberattack detection using network flow statistics |
| topic | large language model machine learning fine-tuning deep learning cybersecurity network security |
| url | https://www.mdpi.com/2076-3417/15/12/6529 |
| work_keys_str_mv | AT leopoldogutierrezgaleano llmbasedcyberattackdetectionusingnetworkflowstatistics AT juanjosedominguezjimenez llmbasedcyberattackdetectionusingnetworkflowstatistics AT jorgschafer llmbasedcyberattackdetectionusingnetworkflowstatistics AT inmaculadamedinabulo llmbasedcyberattackdetectionusingnetworkflowstatistics |