Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations
In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Applied Computing and Geosciences |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590197425000370 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting. |
|---|---|
| ISSN: | 2590-1974 |