Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses

Abstract Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log‐reg; and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zahra Heydari, Ashlynn S. Stillwell
Format:	Article
Language:	English
Published:	Wiley 2024-06-01
Series:	Water Resources Research
Subjects:	classification logistic regression machine learning neural network random forest residential water
Online Access:	https://doi.org/10.1029/2023WR036690
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850212014611234816
author	Zahra Heydari Ashlynn S. Stillwell
author_facet	Zahra Heydari Ashlynn S. Stillwell
author_sort	Zahra Heydari
collection	DOAJ
description	Abstract Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log‐reg; and Neural Networks, NN) for residential water end‐use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log‐reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1‐scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log‐reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree‐based models emerge as the optimal choice for classification tasks in the context of water end‐use data. Our study advances residential smart water metering systems through creating synthetic labeled end‐use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end‐use identification.
format	Article
id	doaj-art-e4fbe6a91e5f47a1aa433370e4657c1b
institution	OA Journals
issn	0043-1397 1944-7973
language	English
publishDate	2024-06-01
publisher	Wiley
record_format	Article
series	Water Resources Research
spelling	doaj-art-e4fbe6a91e5f47a1aa433370e4657c1b2025-08-20T02:09:25ZengWileyWater Resources Research0043-13971944-79732024-06-01606n/an/a10.1029/2023WR036690Comparative Analysis of Supervised Classification Algorithms for Residential Water End UsesZahra Heydari0Ashlynn S. Stillwell1Civil and Environmental Engineering University of Illinois Urbana‐Champaign Urbana IL USACivil and Environmental Engineering University of Illinois Urbana‐Champaign Urbana IL USAAbstract Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log‐reg; and Neural Networks, NN) for residential water end‐use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log‐reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1‐scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log‐reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree‐based models emerge as the optimal choice for classification tasks in the context of water end‐use data. Our study advances residential smart water metering systems through creating synthetic labeled end‐use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end‐use identification.https://doi.org/10.1029/2023WR036690classificationlogistic regressionmachine learningneural networkrandom forestresidential water
spellingShingle	Zahra Heydari Ashlynn S. Stillwell Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses Water Resources Research classification logistic regression machine learning neural network random forest residential water
title	Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
title_full	Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
title_fullStr	Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
title_full_unstemmed	Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
title_short	Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
title_sort	comparative analysis of supervised classification algorithms for residential water end uses
topic	classification logistic regression machine learning neural network random forest residential water
url	https://doi.org/10.1029/2023WR036690
work_keys_str_mv	AT zahraheydari comparativeanalysisofsupervisedclassificationalgorithmsforresidentialwaterenduses AT ashlynnsstillwell comparativeanalysisofsupervisedclassificationalgorithmsforresidentialwaterenduses

Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses

Similar Items