Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo
The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This pa...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925003312 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849470661889622016 |
|---|---|
| author | Aparup Roy Debotosh Bhattacharjee Ondrej Krejcar |
| author_facet | Aparup Roy Debotosh Bhattacharjee Ondrej Krejcar |
| author_sort | Aparup Roy |
| collection | DOAJ |
| description | The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi’s usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance. |
| format | Article |
| id | doaj-art-b8a849b19154492aa88e8f5b2fdfd145 |
| institution | Kabale University |
| issn | 2352-3409 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Data in Brief |
| spelling | doaj-art-b8a849b19154492aa88e8f5b2fdfd1452025-08-20T03:25:05ZengElsevierData in Brief2352-34092025-06-016011159910.1016/j.dib.2025.111599Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodoAparup Roy0Debotosh Bhattacharjee1Ondrej Krejcar2Bachelor of Science (B.S.) in Data Science and Applications (Pursuing), Indian Institute of Technology Madras, BS Degree Office, 3rd Floor, ICSR Building, IIT Madras, Chennai 600036, India; Corresponding author.Department of Computer Science & Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata 700032, West Bengal, India; Center for Basic and Applied Science, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove 50003, Czech RepublicResearch Center, Skoda Auto University, Na Karmeli 1457, 293 01 Mlada Boleslav, Czech RepublicThe Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi’s usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.http://www.sciencedirect.com/science/article/pii/S2352340925003312Vehicular reference misbehavior dataset (VeReMi)Intelligent transportation systems (ITS)Internet of vehicles (IoV)Intrusion detection systems (IDS)Data preprocessingDataset optimization |
| spellingShingle | Aparup Roy Debotosh Bhattacharjee Ondrej Krejcar Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo Data in Brief Vehicular reference misbehavior dataset (VeReMi) Intelligent transportation systems (ITS) Internet of vehicles (IoV) Intrusion detection systems (IDS) Data preprocessing Dataset optimization |
| title | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo |
| title_full | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo |
| title_fullStr | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo |
| title_full_unstemmed | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo |
| title_short | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi datasetZenodo |
| title_sort | improving internet of vehicles research a systematic preprocessing framework for the veremi datasetzenodo |
| topic | Vehicular reference misbehavior dataset (VeReMi) Intelligent transportation systems (ITS) Internet of vehicles (IoV) Intrusion detection systems (IDS) Data preprocessing Dataset optimization |
| url | http://www.sciencedirect.com/science/article/pii/S2352340925003312 |
| work_keys_str_mv | AT aparuproy improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo AT debotoshbhattacharjee improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo AT ondrejkrejcar improvinginternetofvehiclesresearchasystematicpreprocessingframeworkfortheveremidatasetzenodo |