LLM-Based Cyberattack Detection Using Network Flow Statistics

Cybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design ne...

Full description

Saved in:
Bibliographic Details
Main Authors: Leopoldo Gutiérrez-Galeano, Juan-José Domínguez-Jiménez, Jörg Schäfer, Inmaculada Medina-Bulo
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/12/6529
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850156297573367808
author Leopoldo Gutiérrez-Galeano
Juan-José Domínguez-Jiménez
Jörg Schäfer
Inmaculada Medina-Bulo
author_facet Leopoldo Gutiérrez-Galeano
Juan-José Domínguez-Jiménez
Jörg Schäfer
Inmaculada Medina-Bulo
author_sort Leopoldo Gutiérrez-Galeano
collection DOAJ
description Cybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design new strategies to deal with new types of cyberattacks as they arise. Existing tools that harness artificial intelligence techniques mainly use artificial neural networks designed from scratch. In this paper, we present a novel approach for cyberattack detection using an encoder–decoder pre-trained Large Language Model (T5), fine-tuned to adapt its classification scheme for the detection of cyberattacks. Our system is anomaly-based and takes statistics of already finished network flows as input. This work makes significant contributions by introducing a novel methodology for adapting its original task from natural language processing to cybersecurity, achieved by transforming numerical network flow features into a unique abstract artificial language for the model input. We validated the robustness of our detection system across three datasets using undersampling. Our model achieved consistently high performance across all evaluated datasets. Specifically, for the CIC-IDS-2017 dataset, we obtained an accuracy, precision, recall, and F-score of more than 99.94%. For CSE-CIC-IDS-2018, these metrics exceeded 99.84%, and for BCCC-CIC-IDS-2017, they were all above 99.90%. These results collectively demonstrate superior performance for cyberattack detection, while maintaining highly competitive false-positive rates and false-negative rates. This efficacy is achieved by relying exclusively on real-world network flow statistics, without the need for synthetic data generation.
format Article
id doaj-art-c1befc085d9c4300b2d74e02e3408576
institution OA Journals
issn 2076-3417
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-c1befc085d9c4300b2d74e02e34085762025-08-20T02:24:35ZengMDPI AGApplied Sciences2076-34172025-06-011512652910.3390/app15126529LLM-Based Cyberattack Detection Using Network Flow StatisticsLeopoldo Gutiérrez-Galeano0Juan-José Domínguez-Jiménez1Jörg Schäfer2Inmaculada Medina-Bulo3Escuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainEscuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainFaculty of Computer Science and Engineering, Frankfurt University of Applied Sciences, Nibelungenplatz 1, 60318 Frankfurt am Main, GermanyEscuela Superior de Ingeniería, Universidad de Cádiz, Avda. Universidad de Cádiz, 10, 11519 Puerto Real, SpainCybersecurity is a growing area of research due to the constantly emerging new types of cyberthreats. Tools and techniques exist to keep systems secure against certain known types of cyberattacks, but are insufficient for others that have recently appeared. Therefore, research is needed to design new strategies to deal with new types of cyberattacks as they arise. Existing tools that harness artificial intelligence techniques mainly use artificial neural networks designed from scratch. In this paper, we present a novel approach for cyberattack detection using an encoder–decoder pre-trained Large Language Model (T5), fine-tuned to adapt its classification scheme for the detection of cyberattacks. Our system is anomaly-based and takes statistics of already finished network flows as input. This work makes significant contributions by introducing a novel methodology for adapting its original task from natural language processing to cybersecurity, achieved by transforming numerical network flow features into a unique abstract artificial language for the model input. We validated the robustness of our detection system across three datasets using undersampling. Our model achieved consistently high performance across all evaluated datasets. Specifically, for the CIC-IDS-2017 dataset, we obtained an accuracy, precision, recall, and F-score of more than 99.94%. For CSE-CIC-IDS-2018, these metrics exceeded 99.84%, and for BCCC-CIC-IDS-2017, they were all above 99.90%. These results collectively demonstrate superior performance for cyberattack detection, while maintaining highly competitive false-positive rates and false-negative rates. This efficacy is achieved by relying exclusively on real-world network flow statistics, without the need for synthetic data generation.https://www.mdpi.com/2076-3417/15/12/6529large language modelmachine learningfine-tuningdeep learningcybersecuritynetwork security
spellingShingle Leopoldo Gutiérrez-Galeano
Juan-José Domínguez-Jiménez
Jörg Schäfer
Inmaculada Medina-Bulo
LLM-Based Cyberattack Detection Using Network Flow Statistics
Applied Sciences
large language model
machine learning
fine-tuning
deep learning
cybersecurity
network security
title LLM-Based Cyberattack Detection Using Network Flow Statistics
title_full LLM-Based Cyberattack Detection Using Network Flow Statistics
title_fullStr LLM-Based Cyberattack Detection Using Network Flow Statistics
title_full_unstemmed LLM-Based Cyberattack Detection Using Network Flow Statistics
title_short LLM-Based Cyberattack Detection Using Network Flow Statistics
title_sort llm based cyberattack detection using network flow statistics
topic large language model
machine learning
fine-tuning
deep learning
cybersecurity
network security
url https://www.mdpi.com/2076-3417/15/12/6529
work_keys_str_mv AT leopoldogutierrezgaleano llmbasedcyberattackdetectionusingnetworkflowstatistics
AT juanjosedominguezjimenez llmbasedcyberattackdetectionusingnetworkflowstatistics
AT jorgschafer llmbasedcyberattackdetectionusingnetworkflowstatistics
AT inmaculadamedinabulo llmbasedcyberattackdetectionusingnetworkflowstatistics