Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem
Soil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identi...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-03-01
|
| Series: | Smart Agricultural Technology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772375524003320 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850041541325750272 |
|---|---|
| author | Chongchong Qi Nana Zhou Tao Hu Mengting Wu Qiusong Chen Han Wang Kejing Zhang Zhang Lin |
| author_facet | Chongchong Qi Nana Zhou Tao Hu Mengting Wu Qiusong Chen Han Wang Kejing Zhang Zhang Lin |
| author_sort | Chongchong Qi |
| collection | DOAJ |
| description | Soil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identification of Cu-contaminated samples. To address this limitation, we conducted a comprehensive evaluation of three basic machine learning (ML) algorithms and four imbalanced ML algorithms. These methods were used to develop seven continental-scale models for imbalanced classification of soil Cu contamination using visible and near-infrared reflectance spectroscopy. A dataset comprising 18,675 topsoil samples was utilized for training and validation. Hyperparameter optimization was applied to enhance model performance, and multiple statistical metrics were employed for evaluation. Furthermore, feature importance analysis identified key spectral bands influencing Cu classification. Among the tested models, the BalancedRandomForest algorithm demonstrated superior classification performance and generalization ability, achieving an area under the curve of 0.870, recall of 0.816, and balanced accuracy of 0.793. Spectral analysis highlighted the 2310–2320 nm as the most critical spectral region for Cu classification. This study underscores the utility of the optimized model for managing soil Cu pollution and provides a valuable reference for addressing imbalanced learning challenges in soil pollution research. |
| format | Article |
| id | doaj-art-d4ed409a1cb64745a8643492faaec0e1 |
| institution | DOAJ |
| issn | 2772-3755 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Smart Agricultural Technology |
| spelling | doaj-art-d4ed409a1cb64745a8643492faaec0e12025-08-20T02:55:45ZengElsevierSmart Agricultural Technology2772-37552025-03-011010072810.1016/j.atech.2024.100728Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problemChongchong Qi0Nana Zhou1Tao Hu2Mengting Wu3Qiusong Chen4Han Wang5Kejing Zhang6Zhang Lin7School of Resources and Safety Engineering, Central South University, Changsha 410083, China; School of Metallurgy and Environment, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Resources and Safety Engineering, Central South University, Changsha 410083, China; Corresponding authors.School of Resources and Safety Engineering, Central South University, Changsha 410083, ChinaSchool of Metallurgy and Environment, Central South University, Changsha 410083, China; Corresponding authors.School of Metallurgy and Environment, Central South University, Changsha 410083, China; Corresponding authors.School of Metallurgy and Environment, Central South University, Changsha 410083, ChinaSoil copper (Cu) pollution is a significant global environmental challenge, necessitating accurate assessment methods for effective control. However, existing classification approaches for Cu content in soil spectral datasets often face imbalances in data distribution, resulting in unreliable identification of Cu-contaminated samples. To address this limitation, we conducted a comprehensive evaluation of three basic machine learning (ML) algorithms and four imbalanced ML algorithms. These methods were used to develop seven continental-scale models for imbalanced classification of soil Cu contamination using visible and near-infrared reflectance spectroscopy. A dataset comprising 18,675 topsoil samples was utilized for training and validation. Hyperparameter optimization was applied to enhance model performance, and multiple statistical metrics were employed for evaluation. Furthermore, feature importance analysis identified key spectral bands influencing Cu classification. Among the tested models, the BalancedRandomForest algorithm demonstrated superior classification performance and generalization ability, achieving an area under the curve of 0.870, recall of 0.816, and balanced accuracy of 0.793. Spectral analysis highlighted the 2310–2320 nm as the most critical spectral region for Cu classification. This study underscores the utility of the optimized model for managing soil Cu pollution and provides a valuable reference for addressing imbalanced learning challenges in soil pollution research.http://www.sciencedirect.com/science/article/pii/S2772375524003320Soil contaminationHyperspectralContinental scaleCopperSpectral preprocessingImbalanced classification |
| spellingShingle | Chongchong Qi Nana Zhou Tao Hu Mengting Wu Qiusong Chen Han Wang Kejing Zhang Zhang Lin Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem Smart Agricultural Technology Soil contamination Hyperspectral Continental scale Copper Spectral preprocessing Imbalanced classification |
| title | Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem |
| title_full | Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem |
| title_fullStr | Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem |
| title_full_unstemmed | Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem |
| title_short | Prediction of copper contamination in soil across EU using spectroscopy and machine learning: Handling class imbalance problem |
| title_sort | prediction of copper contamination in soil across eu using spectroscopy and machine learning handling class imbalance problem |
| topic | Soil contamination Hyperspectral Continental scale Copper Spectral preprocessing Imbalanced classification |
| url | http://www.sciencedirect.com/science/article/pii/S2772375524003320 |
| work_keys_str_mv | AT chongchongqi predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT nanazhou predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT taohu predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT mengtingwu predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT qiusongchen predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT hanwang predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT kejingzhang predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem AT zhanglin predictionofcoppercontaminationinsoilacrosseuusingspectroscopyandmachinelearninghandlingclassimbalanceproblem |