A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study

In recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less p...

Full description

Saved in:
Bibliographic Details
Main Authors: Morteza Zadkarami, Ali Akbar Safavi, Krist V. Gernaey, Pedram Ramin
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10664546/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850263471683272704
author Morteza Zadkarami
Ali Akbar Safavi
Krist V. Gernaey
Pedram Ramin
author_facet Morteza Zadkarami
Ali Akbar Safavi
Krist V. Gernaey
Pedram Ramin
author_sort Morteza Zadkarami
collection DOAJ
description In recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less prevalent than normal situations. Thereby, coping with an imbalanced data distribution is another challenge studied here. This paper presents an innovative fault detection framework that addresses the challenges of imbalanced data distribution and big data complexities for wastewater treatment plants (WWTPs). The fault scenarios implemented for the WWTP in this research include distortions in both process and equipment, individually as well as together. For this purpose, an advanced preprocessing stage is designed, including a measurement selection method and an under-sampling algorithm. First, the measurements that convey a fair amount of information in terms of different fault scenarios are selected. Subsequently, a novel under-sampling approach is implemented to remove a number of data points from the majority class (normal conditions). The down-sampling strategy is designed in a way that trades off the amount of data elimination and information loss. The extracted features are then inserted into a typical neural network classifier for decision making. The Area Under Curve and Geometric Mean serve as effective indicators in investigating the fault detection capability of handling imbalanced big datasets. When applying the proposed fault detection framework, the average AUC and Gmean for individual faults and faults simulation scenarios are over 98% while without implementing the advanced preprocessing stage the obtained indicator values are below 79%.
format Article
id doaj-art-e58690e3fbc04b00a113c01ba965aec9
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e58690e3fbc04b00a113c01ba965aec92025-08-20T01:54:57ZengIEEEIEEE Access2169-35362024-01-011213213913215810.1109/ACCESS.2024.345451610664546A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case StudyMorteza Zadkarami0https://orcid.org/0000-0003-1070-5947Ali Akbar Safavi1https://orcid.org/0000-0002-2265-8300Krist V. Gernaey2https://orcid.org/0000-0002-0364-1773Pedram Ramin3https://orcid.org/0000-0001-7245-072XSchool of Electrical and Computer Engineering, Shiraz University, Shiraz, IranSchool of Electrical and Computer Engineering, Shiraz University, Shiraz, IranDepartment of Chemical and Biochemical Engineering, Process and Systems Engineering Center (PROSYS), Technical University of Denmark (DTU), Lyngby, DenmarkDepartment of Chemical and Biochemical Engineering, Process and Systems Engineering Center (PROSYS), Technical University of Denmark (DTU), Lyngby, DenmarkIn recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less prevalent than normal situations. Thereby, coping with an imbalanced data distribution is another challenge studied here. This paper presents an innovative fault detection framework that addresses the challenges of imbalanced data distribution and big data complexities for wastewater treatment plants (WWTPs). The fault scenarios implemented for the WWTP in this research include distortions in both process and equipment, individually as well as together. For this purpose, an advanced preprocessing stage is designed, including a measurement selection method and an under-sampling algorithm. First, the measurements that convey a fair amount of information in terms of different fault scenarios are selected. Subsequently, a novel under-sampling approach is implemented to remove a number of data points from the majority class (normal conditions). The down-sampling strategy is designed in a way that trades off the amount of data elimination and information loss. The extracted features are then inserted into a typical neural network classifier for decision making. The Area Under Curve and Geometric Mean serve as effective indicators in investigating the fault detection capability of handling imbalanced big datasets. When applying the proposed fault detection framework, the average AUC and Gmean for individual faults and faults simulation scenarios are over 98% while without implementing the advanced preprocessing stage the obtained indicator values are below 79%.https://ieeexplore.ieee.org/document/10664546/Process monitoringbig data analyticsimbalanced classificationwavelet analysiswastewater treatment plants (WWTPs)
spellingShingle Morteza Zadkarami
Ali Akbar Safavi
Krist V. Gernaey
Pedram Ramin
A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
IEEE Access
Process monitoring
big data analytics
imbalanced classification
wavelet analysis
wastewater treatment plants (WWTPs)
title A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
title_full A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
title_fullStr A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
title_full_unstemmed A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
title_short A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
title_sort process monitoring framework for imbalanced big data a wastewater treatment plant case study
topic Process monitoring
big data analytics
imbalanced classification
wavelet analysis
wastewater treatment plants (WWTPs)
url https://ieeexplore.ieee.org/document/10664546/
work_keys_str_mv AT mortezazadkarami aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT aliakbarsafavi aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT kristvgernaey aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT pedramramin aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT mortezazadkarami processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT aliakbarsafavi processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT kristvgernaey processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy
AT pedramramin processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy