Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo

The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Aparup Roy, Debotosh Bhattacharjee, Ondrej Krejcar
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925003312
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849470661889622016
author Aparup Roy
Debotosh Bhattacharjee
Ondrej Krejcar
author_facet Aparup Roy
Debotosh Bhattacharjee
Ondrej Krejcar
author_sort Aparup Roy
collection DOAJ
description The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi’s usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.
format Article
id doaj-art-b8a849b19154492aa88e8f5b2fdfd145
institution Kabale University
issn 2352-3409
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-b8a849b19154492aa88e8f5b2fdfd1452025-08-20T03:25:05ZengElsevierData in Brief2352-34092025-06-016011159910.1016/j.dib.2025.111599Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodoAparup Roy0Debotosh Bhattacharjee1Ondrej Krejcar2Bachelor of Science (B.S.) in Data Science and Applications (Pursuing), Indian Institute of Technology Madras, BS Degree Office, 3rd Floor, ICSR Building, IIT Madras, Chennai 600036, India; Corresponding author.Department of Computer Science & Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata 700032, West Bengal, India; Center for Basic and Applied Science, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove 50003, Czech RepublicResearch Center, Skoda Auto University, Na Karmeli 1457, 293 01 Mlada Boleslav, Czech RepublicThe Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi’s usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.http://www.sciencedirect.com/science/article/pii/S2352340925003312Vehicular reference misbehavior dataset (VeReMi)Intelligent transportation systems (ITS)Internet of vehicles (IoV)Intrusion detection systems (IDS)Data preprocessingDataset optimization
spellingShingle Aparup Roy
Debotosh Bhattacharjee
Ondrej Krejcar
Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
Data in Brief
Vehicular reference misbehavior dataset (VeReMi)
Intelligent transportation systems (ITS)
Internet of vehicles (IoV)
Intrusion detection systems (IDS)
Data preprocessing
Dataset optimization
title Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
title_full Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
title_fullStr Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
title_full_unstemmed Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
title_short Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
title_sort improving internet of vehicles research a systematic preprocessing framework for the veremi datasetzenodo
topic Vehicular reference misbehavior dataset (VeReMi)
Intelligent transportation systems (ITS)
Internet of vehicles (IoV)
Intrusion detection systems (IDS)
Data preprocessing
Dataset optimization
url http://www.sciencedirect.com/science/article/pii/S2352340925003312
work_keys_str_mv AT aparuproy improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo
AT debotoshbhattacharjee improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo
AT ondrejkrejcar improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo