Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets

Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of...

Full description

Saved in:
Bibliographic Details
Main Authors: Dominic Sanderson, Tatiana Kalganova
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/5/98
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849711433277767680
author Dominic Sanderson
Tatiana Kalganova
author_facet Dominic Sanderson
Tatiana Kalganova
author_sort Dominic Sanderson
collection DOAJ
description Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model.
format Article
id doaj-art-98461ef8d0f14b698c87367040a2d08a
institution DOAJ
issn 2673-2688
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series AI
spelling doaj-art-98461ef8d0f14b698c87367040a2d08a2025-08-20T03:14:38ZengMDPI AGAI2673-26882025-05-01659810.3390/ai6050098Identifying Suitability for Data Reduction in Imbalanced Time-Series DatasetsDominic Sanderson0Tatiana Kalganova1Department of Electronic and Electrical Engineering, Brunel University of London, Uxbridge UB8 3PH, UKDepartment of Electronic and Electrical Engineering, Brunel University of London, Uxbridge UB8 3PH, UKOccupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model.https://www.mdpi.com/2673-2688/6/5/98occupancy detectiondata reductiondynamic data applicationtime-series datauseful dataclass balance
spellingShingle Dominic Sanderson
Tatiana Kalganova
Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
AI
occupancy detection
data reduction
dynamic data application
time-series data
useful data
class balance
title Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
title_full Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
title_fullStr Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
title_full_unstemmed Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
title_short Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
title_sort identifying suitability for data reduction in imbalanced time series datasets
topic occupancy detection
data reduction
dynamic data application
time-series data
useful data
class balance
url https://www.mdpi.com/2673-2688/6/5/98
work_keys_str_mv AT dominicsanderson identifyingsuitabilityfordatareductioninimbalancedtimeseriesdatasets
AT tatianakalganova identifyingsuitabilityfordatareductioninimbalancedtimeseriesdatasets