Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets

Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of...

Full description

Saved in:
Bibliographic Details
Main Authors: Dominic Sanderson, Tatiana Kalganova
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/5/98
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model.
ISSN:2673-2688