Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs
Abstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting a...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-07-01
|
| Series: | Journal of Big Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s40537-025-01214-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849764207654862848 |
|---|---|
| author | Soto Anno Kota Tsubouchi Masamichi Shimosaka |
| author_facet | Soto Anno Kota Tsubouchi Masamichi Shimosaka |
| author_sort | Soto Anno |
| collection | DOAJ |
| description | Abstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting area is limited to regions near railroad stations because the logs do not explicitly reflect, but only implicitly, the locations away from stations where people go after arriving. To address this issue, this paper presents an early crowd forecasting method capable of predicting crowding a week in advance in both station vicinities and areas away from stations by introducing an innovative crowd forecasting model called geographically complemented multi-task Poisson regression (GCPR). Our method infers the flows of people after they arrive at railroad stations based on GPS-based mobility logs and transit search logs by leveraging the heterogeneous characteristics of nearby stations. Specifically, the model forecasts the number of visitors to an event 1 week in advance by using transit search logs recorded more than 1 week prior to the event, along with contextual features (such as day of the week) and time information. Furthermore, the model performs multi-task learning for station arrival schedules and mobility patterns, addressing the challenge of accurately predicting people flow to congestion points based on geographical and mobility proximity between stations and crowded areas. We conduct an empirical evaluation using a real-world dataset that includes 12 large-scale events held in Japan from 2019 to 2020, such as the Jingu Gaien Fireworks Festival, the Comik Market 96, and the Rugby World Cup 2019. Results demonstrate that the GCPR can forecast crowd gatherings 1 week before their occurrence in areas previously challenging to predict, achieving up to 42% performance improvement over CityOutlook+, a state-of-the-art approach for early crowd forecasting. |
| format | Article |
| id | doaj-art-7fb70e97a8224b889aa074221d0b5b2e |
| institution | DOAJ |
| issn | 2196-1115 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | SpringerOpen |
| record_format | Article |
| series | Journal of Big Data |
| spelling | doaj-art-7fb70e97a8224b889aa074221d0b5b2e2025-08-20T03:05:13ZengSpringerOpenJournal of Big Data2196-11152025-07-0112115210.1186/s40537-025-01214-6Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logsSoto Anno0Kota Tsubouchi1Masamichi Shimosaka2Department of Computer Science, Institute of Science TokyoLY CorporationDepartment of Computer Science, Institute of Science TokyoAbstract Forecasting crowd gatherings in advance, such as 1 week before they happen, plays a vital role in ensuring smooth mobility and public safety. Although early crowd forecasting has become possible by leveraging visitors’ mobility schedules extracted from transit search logs, the forecasting area is limited to regions near railroad stations because the logs do not explicitly reflect, but only implicitly, the locations away from stations where people go after arriving. To address this issue, this paper presents an early crowd forecasting method capable of predicting crowding a week in advance in both station vicinities and areas away from stations by introducing an innovative crowd forecasting model called geographically complemented multi-task Poisson regression (GCPR). Our method infers the flows of people after they arrive at railroad stations based on GPS-based mobility logs and transit search logs by leveraging the heterogeneous characteristics of nearby stations. Specifically, the model forecasts the number of visitors to an event 1 week in advance by using transit search logs recorded more than 1 week prior to the event, along with contextual features (such as day of the week) and time information. Furthermore, the model performs multi-task learning for station arrival schedules and mobility patterns, addressing the challenge of accurately predicting people flow to congestion points based on geographical and mobility proximity between stations and crowded areas. We conduct an empirical evaluation using a real-world dataset that includes 12 large-scale events held in Japan from 2019 to 2020, such as the Jingu Gaien Fireworks Festival, the Comik Market 96, and the Rugby World Cup 2019. Results demonstrate that the GCPR can forecast crowd gatherings 1 week before their occurrence in areas previously challenging to predict, achieving up to 42% performance improvement over CityOutlook+, a state-of-the-art approach for early crowd forecasting.https://doi.org/10.1186/s40537-025-01214-6Crowd forecastingCrowd dynamicsActive populationUrban computingMobility logsTransit search logs |
| spellingShingle | Soto Anno Kota Tsubouchi Masamichi Shimosaka Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs Journal of Big Data Crowd forecasting Crowd dynamics Active population Urban computing Mobility logs Transit search logs |
| title | Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| title_full | Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| title_fullStr | Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| title_full_unstemmed | Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| title_short | Early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| title_sort | early crowd forecasting away from stations by geographically complemented regression using transit search and mobility logs |
| topic | Crowd forecasting Crowd dynamics Active population Urban computing Mobility logs Transit search logs |
| url | https://doi.org/10.1186/s40537-025-01214-6 |
| work_keys_str_mv | AT sotoanno earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs AT kotatsubouchi earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs AT masamichishimosaka earlycrowdforecastingawayfromstationsbygeographicallycomplementedregressionusingtransitsearchandmobilitylogs |