Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets
Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | AI |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-2688/6/5/98 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711433277767680 |
|---|---|
| author | Dominic Sanderson Tatiana Kalganova |
| author_facet | Dominic Sanderson Tatiana Kalganova |
| author_sort | Dominic Sanderson |
| collection | DOAJ |
| description | Occupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model. |
| format | Article |
| id | doaj-art-98461ef8d0f14b698c87367040a2d08a |
| institution | DOAJ |
| issn | 2673-2688 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | AI |
| spelling | doaj-art-98461ef8d0f14b698c87367040a2d08a2025-08-20T03:14:38ZengMDPI AGAI2673-26882025-05-01659810.3390/ai6050098Identifying Suitability for Data Reduction in Imbalanced Time-Series DatasetsDominic Sanderson0Tatiana Kalganova1Department of Electronic and Electrical Engineering, Brunel University of London, Uxbridge UB8 3PH, UKDepartment of Electronic and Electrical Engineering, Brunel University of London, Uxbridge UB8 3PH, UKOccupancy detection for large buildings enables optimised control of indoor systems based on occupant presence, reducing the energy costs of heating and cooling. Through machine learning models, occupancy detection is achieved with an accuracy of over 95%. However, to achieve this, large amounts of data are collected with little consideration of which of the collected data are most useful to the task. This paper demonstrates methods to identify if data may be removed from the imbalanced time-series training datasets to optimise the training process and model performance. It also describes how the calculation of the class density of a dataset may be used to identify if a dataset is applicable for data reduction, and how dataset fusion may be used to combine occupancy datasets. The results show that over 50% of a training dataset may be removed from imbalanced datasets while maintaining performance, reducing training time and energy cost by over 40%. This indicates that a data-centric approach to developing artificial intelligence applications is as important as selecting the best model.https://www.mdpi.com/2673-2688/6/5/98occupancy detectiondata reductiondynamic data applicationtime-series datauseful dataclass balance |
| spellingShingle | Dominic Sanderson Tatiana Kalganova Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets AI occupancy detection data reduction dynamic data application time-series data useful data class balance |
| title | Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets |
| title_full | Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets |
| title_fullStr | Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets |
| title_full_unstemmed | Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets |
| title_short | Identifying Suitability for Data Reduction in Imbalanced Time-Series Datasets |
| title_sort | identifying suitability for data reduction in imbalanced time series datasets |
| topic | occupancy detection data reduction dynamic data application time-series data useful data class balance |
| url | https://www.mdpi.com/2673-2688/6/5/98 |
| work_keys_str_mv | AT dominicsanderson identifyingsuitabilityfordatareductioninimbalancedtimeseriesdatasets AT tatianakalganova identifyingsuitabilityfordatareductioninimbalancedtimeseriesdatasets |