Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem

Soil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identi...

Full description

Saved in:
Bibliographic Details
Main Authors: Chongchong Qi, Nana Zhou, Tao Hu, Mengting Wu, Qiusong Chen, Han Wang, Kejing Zhang, Zhang Lin
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Smart Agricultural Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772375524003320
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850041541325750272
author Chongchong Qi
Nana Zhou
Tao Hu
Mengting Wu
Qiusong Chen
Han Wang
Kejing Zhang
Zhang Lin
author_facet Chongchong Qi
Nana Zhou
Tao Hu
Mengting Wu
Qiusong Chen
Han Wang
Kejing Zhang
Zhang Lin
author_sort Chongchong Qi
collection DOAJ
description Soil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identification of Cu-contaminated samples. To address this limitation, we conducted a comprehensive evaluation of three basic machine learning (ML) algorithms and four imbalanced ML algorithms. These methods were used to develop seven continental-scale models for imbalanced classification of soil Cu contamination using visible and near-infrared reflectance spectroscopy. A dataset comprising 18,675 topsoil samples was utilized for training and validation. Hyperparameter optimization was applied to enhance model performance, and multiple statistical metrics were employed for evaluation. Furthermore, feature importance analysis identified key spectral bands influencing Cu classification. Among the tested models, the BalancedRandomForest algorithm demonstrated superior classification performance and generalization ability, achieving an area under the curve of 0.870, recall of 0.816, and balanced accuracy of 0.793. Spectral analysis highlighted the 2310–2320 nm as the most critical spectral region for Cu classification. This study underscores the utility of the optimized model for managing soil Cu pollution and provides a valuable reference for addressing imbalanced learning challenges in soil pollution research.
format Article
id doaj-art-d4ed409a1cb64745a8643492faaec0e1
institution DOAJ
issn 2772-3755
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Smart Agricultural Technology
spelling doaj-art-d4ed409a1cb64745a8643492faaec0e12025-08-20T02:55:45ZengElsevierSmart Agricultural Technology2772-37552025-03-011010072810.1016/j.atech.2024.100728Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problemChongchong Qi0Nana Zhou1Tao Hu2Mengting Wu3Qiusong Chen4Han Wang5Kejing Zhang6Zhang Lin7School of Resources and Safety Engineering, Central South University, Changsha 410083, China; School of Metallurgy and Environment, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, China; Corresponding authors.School of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Metallurgy and Environment, Central South University, Changsha 410083, China; Corresponding authors.School of Metallurgy and Environment, Central South University, Changsha 410083, China; Corresponding authors.School of Metallurgy and Environment, Central South University, Changsha 410083, ChinaSoil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identification of Cu-contaminated samples. To address this limitation, we conducted a comprehensive evaluation of three basic machine learning (ML) algorithms and four imbalanced ML algorithms. These methods were used to develop seven continental-scale models for imbalanced classification of soil Cu contamination using visible and near-infrared reflectance spectroscopy. A dataset comprising 18,675 topsoil samples was utilized for training and validation. Hyperparameter optimization was applied to enhance model performance, and multiple statistical metrics were employed for evaluation. Furthermore, feature importance analysis identified key spectral bands influencing Cu classification. Among the tested models, the BalancedRandomForest algorithm demonstrated superior classification performance and generalization ability, achieving an area under the curve of 0.870, recall of 0.816, and balanced accuracy of 0.793. Spectral analysis highlighted the 2310–2320 nm as the most critical spectral region for Cu classification. This study underscores the utility of the optimized model for managing soil Cu pollution and provides a valuable reference for addressing imbalanced learning challenges in soil pollution research.http://www.sciencedirect.com/science/article/pii/S2772375524003320Soil contaminationHyperspectralContinental scaleCopperSpectral preprocessingImbalanced classification
spellingShingle Chongchong Qi
Nana Zhou
Tao Hu
Mengting Wu
Qiusong Chen
Han Wang
Kejing Zhang
Zhang Lin
Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
Smart Agricultural Technology
Soil contamination
Hyperspectral
Continental scale
Copper
Spectral preprocessing
Imbalanced classification
title Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
title_full Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
title_fullStr Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
title_full_unstemmed Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
title_short Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
title_sort prediction of copper contamination in soil across eu using spectroscopy and machine learning handling class imbalance problem
topic Soil contamination
Hyperspectral
Continental scale
Copper
Spectral preprocessing
Imbalanced classification
url http://www.sciencedirect.com/science/article/pii/S2772375524003320
work_keys_str_mv AT chongchongqi predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT nanazhou predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT taohu predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT mengtingwu predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT qiusongchen predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT hanwang predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT kejingzhang predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem
AT zhanglin predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem