Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example

Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental viol...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhefeng Xu, Xiahong Shi, Wei Shu, Yilu Xin, Xuan Zan, Zhaonian Si, Jinping Cheng
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:Environment International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0160412025003459
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849688532675723264
author Zhefeng Xu
Xiahong Shi
Wei Shu
Yilu Xin
Xuan Zan
Zhaonian Si
Jinping Cheng
author_facet Zhefeng Xu
Xiahong Shi
Wei Shu
Yilu Xin
Xuan Zan
Zhaonian Si
Jinping Cheng
author_sort Zhefeng Xu
collection DOAJ
description Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.
format Article
id doaj-art-737f063ee4474882baeef3a7b41015f4
institution DOAJ
issn 0160-4120
language English
publishDate 2025-07-01
publisher Elsevier
record_format Article
series Environment International
spelling doaj-art-737f063ee4474882baeef3a7b41015f42025-08-20T03:21:59ZengElsevierEnvironment International0160-41202025-07-0120110959410.1016/j.envint.2025.109594Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an exampleZhefeng Xu0Xiahong Shi1Wei Shu2Yilu Xin3Xuan Zan4Zhaonian Si5Jinping Cheng6School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaLaw Enforcement Team of Shanghai Ecological Environment Bureau, Shanghai 200235, ChinaHuangpu District Environmental Monitoring Station of Shanghai, Shanghai 200000, ChinaShanghai SECCO Petrochemical Company Limited, Shanghai 200051, ChinaShanghai SECCO Petrochemical Company Limited, Shanghai 200051, ChinaSchool of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; Corresponding author.Continuous Emission Monitoring Systems (CEMS) are critical for real-time pollutant measurement, widely deployed to supervise industrial emissions and ensure regulatory compliance. Despite their utility, CEMS data face challenges of data fabrications, complicating efforts to detect environmental violations, which may be detected according to emission pattern changes. This study explores the application of machine learning classifiers to analyse Continuous Emission Monitoring Systems data from 107 waste discharge outlets across 31 corporations in a Chinese chemical industrial park. By categorizing outlets into 12 datasets based on monitoring parameters, 17 machine learning models were evaluated to identify emission patterns and detect potential data anomalies. Random Forest classifiers (RFC) consistently demonstrated high accuracy (up to 100% in specific datasets), outperforming other models, while gradient boost-based methods also excelled. Temporal emission pattern analysis revealed significant changes in 334 instances (90% confidence) across collection weeks, though only 24 aligned with regulatory offsite supervision records, highlighting discrepancies between algorithmic detection and traditional compliance checks. Vector distances and cosine similarities of mean/median emission values correlated with misprediction probabilities, yet fewer than 60% of pattern changes coincided with extremum values in these metrics. The study underscores the efficacy of RFCs in distinguishing outlet-specific emission profiles and proposes a supplemental approach to uncover subtle data manipulation or operational shifts. However, challenges persist in linking algorithmic findings to documented violations, emphasizing the need for integrated data frameworks to enhance environmental oversight. This work advances machine learning classifier’s role in emission monitoring, offering a pathway for CEMS management and regulatory strategy refinement.http://www.sciencedirect.com/science/article/pii/S0160412025003459CEMSMachine-learning classifierChemical industrial parkEmission patternVector comparison
spellingShingle Zhefeng Xu
Xiahong Shi
Wei Shu
Yilu Xin
Xuan Zan
Zhaonian Si
Jinping Cheng
Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
Environment International
CEMS
Machine-learning classifier
Chemical industrial park
Emission pattern
Vector comparison
title Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
title_full Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
title_fullStr Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
title_full_unstemmed Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
title_short Machine learning classifiers to detect data pattern change of continuous emission monitoring system: A typical chemical industrial park as an example
title_sort machine learning classifiers to detect data pattern change of continuous emission monitoring system a typical chemical industrial park as an example
topic CEMS
Machine-learning classifier
Chemical industrial park
Emission pattern
Vector comparison
url http://www.sciencedirect.com/science/article/pii/S0160412025003459
work_keys_str_mv AT zhefengxu machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT xiahongshi machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT weishu machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT yiluxin machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT xuanzan machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT zhaoniansi machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample
AT jinpingcheng machinelearningclassifierstodetectdatapatternchangeofcontinuousemissionmonitoringsystematypicalchemicalindustrialparkasanexample