Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations

In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential t...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Ashar Hussain, Venkatesh Budamala, Rajarshi Das Bhowmik
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Applied Computing and Geosciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590197425000370
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.
ISSN:2590-1974