Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting

Predicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tac...

Full description

Saved in:
Bibliographic Details
Main Authors: Mustafa M. Kara, H. Irem Turkmen, M. Amac Guvensan
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/4/1225
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850080098878750720
author Mustafa M. Kara
H. Irem Turkmen
M. Amac Guvensan
author_facet Mustafa M. Kara
H. Irem Turkmen
M. Amac Guvensan
author_sort Mustafa M. Kara
collection DOAJ
description Predicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tackle the issue of imbalanced datasets in traffic speed prediction. Traffic speed data are often biased toward high numbers because low traffic speeds are infrequent. The temporal aspect of traffic carries two important factors for low-speed value. The daily population movement, captured by the time of day, and the weather data, recorded by month, are both considered in this study. Hour-wise Pattern Organization and Month-wise Pattern Organization techniques were devised, which organize the speed data using these two factors as a metric with a view to providing a superior representation of data characteristics that are in the minority. In addition to these two methods, a Speed-wise Pattern Organization strategy is proposed, which arranges train and test samples by setting boundaries on speed while taking the volatile nature of traffic into consideration. We evaluated these strategies using four popular model types: long short-term memory (LSTM), gated recurrent unit networks (GRUs), bi-directional LSTM, and convolutional neural networks (CNNs). GRU had the best performance, achieving a MAPE (Mean Absolute Percentage Error) of 13.51%, whereas LSTM demonstrated the lowest performance, with a MAPE of 13.74%. We validated their robustness through our studies and observed improvements in model accuracy across all categories. While the average improvement was approximately 4%, our methodologies demonstrated superior performance in low-traffic speed scenarios, augmenting model prediction accuracy by 11.2%. The presented methodologies in this study are applied in the pre-processing steps, allowing their application with various models and additional pre-processing procedures to attain comparable performance improvements.
format Article
id doaj-art-2348bade76534f00a1172ee344897ca8
institution DOAJ
issn 1424-8220
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-2348bade76534f00a1172ee344897ca82025-08-20T02:45:01ZengMDPI AGSensors1424-82202025-02-01254122510.3390/s25041225Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic ForecastingMustafa M. Kara0H. Irem Turkmen1M. Amac Guvensan2Computer Engineering Department, Yildiz Technical University, Istanbul 34220, TürkiyeComputer Engineering Department, Yildiz Technical University, Istanbul 34220, TürkiyeComputer Engineering Department, Yildiz Technical University, Istanbul 34220, TürkiyePredicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tackle the issue of imbalanced datasets in traffic speed prediction. Traffic speed data are often biased toward high numbers because low traffic speeds are infrequent. The temporal aspect of traffic carries two important factors for low-speed value. The daily population movement, captured by the time of day, and the weather data, recorded by month, are both considered in this study. Hour-wise Pattern Organization and Month-wise Pattern Organization techniques were devised, which organize the speed data using these two factors as a metric with a view to providing a superior representation of data characteristics that are in the minority. In addition to these two methods, a Speed-wise Pattern Organization strategy is proposed, which arranges train and test samples by setting boundaries on speed while taking the volatile nature of traffic into consideration. We evaluated these strategies using four popular model types: long short-term memory (LSTM), gated recurrent unit networks (GRUs), bi-directional LSTM, and convolutional neural networks (CNNs). GRU had the best performance, achieving a MAPE (Mean Absolute Percentage Error) of 13.51%, whereas LSTM demonstrated the lowest performance, with a MAPE of 13.74%. We validated their robustness through our studies and observed improvements in model accuracy across all categories. While the average improvement was approximately 4%, our methodologies demonstrated superior performance in low-traffic speed scenarios, augmenting model prediction accuracy by 11.2%. The presented methodologies in this study are applied in the pre-processing steps, allowing their application with various models and additional pre-processing procedures to attain comparable performance improvements.https://www.mdpi.com/1424-8220/25/4/1225long-term traffic speed predictionintelligent transportation systemsdeep learningdata preprocessingimbalanced datasetsdata grouping
spellingShingle Mustafa M. Kara
H. Irem Turkmen
M. Amac Guvensan
Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
Sensors
long-term traffic speed prediction
intelligent transportation systems
deep learning
data preprocessing
imbalanced datasets
data grouping
title Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
title_full Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
title_fullStr Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
title_full_unstemmed Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
title_short Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
title_sort smart organization of imbalanced traffic datasets for long term traffic forecasting
topic long-term traffic speed prediction
intelligent transportation systems
deep learning
data preprocessing
imbalanced datasets
data grouping
url https://www.mdpi.com/1424-8220/25/4/1225
work_keys_str_mv AT mustafamkara smartorganizationofimbalancedtrafficdatasetsforlongtermtrafficforecasting
AT hiremturkmen smartorganizationofimbalancedtrafficdatasetsforlongtermtrafficforecasting
AT mamacguvensan smartorganizationofimbalancedtrafficdatasetsforlongtermtrafficforecasting