A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
In recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less p...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10664546/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850263471683272704 |
|---|---|
| author | Morteza Zadkarami Ali Akbar Safavi Krist V. Gernaey Pedram Ramin |
| author_facet | Morteza Zadkarami Ali Akbar Safavi Krist V. Gernaey Pedram Ramin |
| author_sort | Morteza Zadkarami |
| collection | DOAJ |
| description | In recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less prevalent than normal situations. Thereby, coping with an imbalanced data distribution is another challenge studied here. This paper presents an innovative fault detection framework that addresses the challenges of imbalanced data distribution and big data complexities for wastewater treatment plants (WWTPs). The fault scenarios implemented for the WWTP in this research include distortions in both process and equipment, individually as well as together. For this purpose, an advanced preprocessing stage is designed, including a measurement selection method and an under-sampling algorithm. First, the measurements that convey a fair amount of information in terms of different fault scenarios are selected. Subsequently, a novel under-sampling approach is implemented to remove a number of data points from the majority class (normal conditions). The down-sampling strategy is designed in a way that trades off the amount of data elimination and information loss. The extracted features are then inserted into a typical neural network classifier for decision making. The Area Under Curve and Geometric Mean serve as effective indicators in investigating the fault detection capability of handling imbalanced big datasets. When applying the proposed fault detection framework, the average AUC and Gmean for individual faults and faults simulation scenarios are over 98% while without implementing the advanced preprocessing stage the obtained indicator values are below 79%. |
| format | Article |
| id | doaj-art-e58690e3fbc04b00a113c01ba965aec9 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e58690e3fbc04b00a113c01ba965aec92025-08-20T01:54:57ZengIEEEIEEE Access2169-35362024-01-011213213913215810.1109/ACCESS.2024.345451610664546A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case StudyMorteza Zadkarami0https://orcid.org/0000-0003-1070-5947Ali Akbar Safavi1https://orcid.org/0000-0002-2265-8300Krist V. Gernaey2https://orcid.org/0000-0002-0364-1773Pedram Ramin3https://orcid.org/0000-0001-7245-072XSchool of Electrical and Computer Engineering, Shiraz University, Shiraz, IranSchool of Electrical and Computer Engineering, Shiraz University, Shiraz, IranDepartment of Chemical and Biochemical Engineering, Process and Systems Engineering Center (PROSYS), Technical University of Denmark (DTU), Lyngby, DenmarkDepartment of Chemical and Biochemical Engineering, Process and Systems Engineering Center (PROSYS), Technical University of Denmark (DTU), Lyngby, DenmarkIn recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less prevalent than normal situations. Thereby, coping with an imbalanced data distribution is another challenge studied here. This paper presents an innovative fault detection framework that addresses the challenges of imbalanced data distribution and big data complexities for wastewater treatment plants (WWTPs). The fault scenarios implemented for the WWTP in this research include distortions in both process and equipment, individually as well as together. For this purpose, an advanced preprocessing stage is designed, including a measurement selection method and an under-sampling algorithm. First, the measurements that convey a fair amount of information in terms of different fault scenarios are selected. Subsequently, a novel under-sampling approach is implemented to remove a number of data points from the majority class (normal conditions). The down-sampling strategy is designed in a way that trades off the amount of data elimination and information loss. The extracted features are then inserted into a typical neural network classifier for decision making. The Area Under Curve and Geometric Mean serve as effective indicators in investigating the fault detection capability of handling imbalanced big datasets. When applying the proposed fault detection framework, the average AUC and Gmean for individual faults and faults simulation scenarios are over 98% while without implementing the advanced preprocessing stage the obtained indicator values are below 79%.https://ieeexplore.ieee.org/document/10664546/Process monitoringbig data analyticsimbalanced classificationwavelet analysiswastewater treatment plants (WWTPs) |
| spellingShingle | Morteza Zadkarami Ali Akbar Safavi Krist V. Gernaey Pedram Ramin A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study IEEE Access Process monitoring big data analytics imbalanced classification wavelet analysis wastewater treatment plants (WWTPs) |
| title | A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study |
| title_full | A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study |
| title_fullStr | A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study |
| title_full_unstemmed | A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study |
| title_short | A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study |
| title_sort | process monitoring framework for imbalanced big data a wastewater treatment plant case study |
| topic | Process monitoring big data analytics imbalanced classification wavelet analysis wastewater treatment plants (WWTPs) |
| url | https://ieeexplore.ieee.org/document/10664546/ |
| work_keys_str_mv | AT mortezazadkarami aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT aliakbarsafavi aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT kristvgernaey aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT pedramramin aprocessmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT mortezazadkarami processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT aliakbarsafavi processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT kristvgernaey processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy AT pedramramin processmonitoringframeworkforimbalancedbigdataawastewatertreatmentplantcasestudy |