Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs

Abstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting a...

Full description

Saved in:
Bibliographic Details
Main Authors: Soto Anno, Kota Tsubouchi, Masamichi Shimosaka
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01214-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849764207654862848
author Soto Anno
Kota Tsubouchi
Masamichi Shimosaka
author_facet Soto Anno
Kota Tsubouchi
Masamichi Shimosaka
author_sort Soto Anno
collection DOAJ
description Abstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting area is limited to regions near railroad stations because the logs do not explicitly reflect, but only implicitly, the locations away from stations where people go after arriving. To address this issue, this paper presents an early crowd forecasting method capable of predicting crowding a week in advance in both station vicinities and areas away from stations by introducing an innovative crowd forecasting model called geographically complemented multi-task Poisson regression (GCPR). Our method infers the flows of people after they arrive at railroad stations based on GPS-based mobility logs and transit search logs by leveraging the heterogeneous characteristics of nearby stations. Specifically, the model forecasts the number of visitors to an event 1 week in advance by using transit search logs recorded more than 1 week prior to the event, along with contextual features (such as day of the week) and time information. Furthermore, the model performs multi-task learning for station arrival schedules and mobility patterns, addressing the challenge of accurately predicting people flow to congestion points based on geographical and mobility proximity between stations and crowded areas. We conduct an empirical evaluation using a real-world dataset that includes 12 large-scale events held in Japan from 2019 to 2020, such as the Jingu Gaien Fireworks Festival, the Comik Market 96, and the Rugby World Cup 2019. Results demonstrate that the GCPR can forecast crowd gatherings 1 week before their occurrence in areas previously challenging to predict, achieving up to 42% performance improvement over CityOutlook+, a state-of-the-art approach for early crowd forecasting.
format Article
id doaj-art-7fb70e97a8224b889aa074221d0b5b2e
institution DOAJ
issn 2196-1115
language English
publishDate 2025-07-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-7fb70e97a8224b889aa074221d0b5b2e2025-08-20T03:05:13ZengSpringerOpenJournal of Big Data2196-11152025-07-0112115210.1186/s40537-025-01214-6Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logsSoto Anno0Kota Tsubouchi1Masamichi Shimosaka2Department of Computer Science, Institute of Science TokyoLY CorporationDepartment of Computer Science, Institute of Science TokyoAbstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting area is limited to regions near railroad stations because the logs do not explicitly reflect, but only implicitly, the locations away from stations where people go after arriving. To address this issue, this paper presents an early crowd forecasting method capable of predicting crowding a week in advance in both station vicinities and areas away from stations by introducing an innovative crowd forecasting model called geographically complemented multi-task Poisson regression (GCPR). Our method infers the flows of people after they arrive at railroad stations based on GPS-based mobility logs and transit search logs by leveraging the heterogeneous characteristics of nearby stations. Specifically, the model forecasts the number of visitors to an event 1 week in advance by using transit search logs recorded more than 1 week prior to the event, along with contextual features (such as day of the week) and time information. Furthermore, the model performs multi-task learning for station arrival schedules and mobility patterns, addressing the challenge of accurately predicting people flow to congestion points based on geographical and mobility proximity between stations and crowded areas. We conduct an empirical evaluation using a real-world dataset that includes 12 large-scale events held in Japan from 2019 to 2020, such as the Jingu Gaien Fireworks Festival, the Comik Market 96, and the Rugby World Cup 2019. Results demonstrate that the GCPR can forecast crowd gatherings 1 week before their occurrence in areas previously challenging to predict, achieving up to 42% performance improvement over CityOutlook+, a state-of-the-art approach for early crowd forecasting.https://doi.org/10.1186/s40537-025-01214-6Crowd forecastingCrowd dynamicsActive populationUrban computingMobility logsTransit search logs
spellingShingle Soto Anno
Kota Tsubouchi
Masamichi Shimosaka
Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
Journal of Big Data
Crowd forecasting
Crowd dynamics
Active population
Urban computing
Mobility logs
Transit search logs
title Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
title_full Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
title_fullStr Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
title_full_unstemmed Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
title_short Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
title_sort early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
topic Crowd forecasting
Crowd dynamics
Active population
Urban computing
Mobility logs
Transit search logs
url https://doi.org/10.1186/s40537-025-01214-6
work_keys_str_mv AT sotoanno earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs
AT kotatsubouchi earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs
AT masamichishimosaka earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs