Forecasting Citywide Crowd Flows using Big Data Minh
Forecasting Citywide Crowd Flows using Big Data Minh Hoang, Yu Zheng, Ambuj Singh mhoang@cs. ucsb. edu SIGSPATIAL 2016
2 Shanghai Stampede New Year Celebration 2015
3 Occupy Wall Street, Sep, 2014 8 am Sep, 17, 2014
4 Macroscopic city traffic prediction Traffic prediction for roads/freeway Flow of crowds prediction for regions Microscopic view not useful for city planning + ignore where/when traffic flow starts and ends + low-level information overload + high prediction cost Macroscopic prediction for urban planning + understand regional functions + distribute resources/services + detect city-scale anomaly + lower prediction cost
5 Forecasting Citywide Crowd Flows A region Other regions End-flow Start End New-flow Other regions
Challenges 6 A region Other regions End-flow Start End New-flow Other regions + Scalable solution 1. How to find regions? + Meaningful regions + Different crowd flow patterns 2. How to make predictions? + Spatio-temporal dependencies + Robust to missing/noisy data
7 Finding regions: Map segmentation Regions are city blocks bound by roads Road network Low-level regions Map segmentation Drawbacks: 1. Too many regions 2. Regions has varying sizes & crowd volumes Not scalable Information overload Hard to distribute resources
8 Finding regions: Clustering regions Low-level regions High-level regions Clustering High-level regions = Groups of city blocks that 1. Are adjacent on the geographical map 2. Have similar crowd flow patterns 3. Have considerable total crowd flow volumes
Finding regions 9 Clustering low-level regions High-level regions Low-level region graph Graph clustering Node == low-level region Edge == adjacency on map Flow volume Node weight == Sum(flows) Edge weight == Spearman(flows) Flow similarity Clustering objectives: 1. Edge cut minimization Group low-level regions with similar patterns 2. Cluster balancing: Clusters with similar sum of node weights High-level regions have comparable volumes
10 Insights from Regional Crowd Flows One day New flow End flow 7 Residential Area (Leave in the morning, come back at night) 19 Tourist Attractions (Forbidden city) 26 Regions in Beijing City center New ~ end #regions is chosen by elbow method
Predicting crowd flows 11 Intra-region Patterns May 04 -17, 2015 Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun New-flow Mon Tue Wed Thu 6 am New-flow Seasonal patterns: Daily & Weekly 3 pm Trend: Different hours in day have different trends
Predicting crowd flows Inter-region Patterns 3 New-flow 12 Decrease Increase 1 End-flow Decrease Increase Neighboring regions affect each other June 3 rd, 2015
13 Predicting crowd flows Affect of weather & holidays
14 Predicting crow flows Flow decomposition Crowd flow = Seasonal + Trend + Residual Normal/holiday Temporal Model Intra-region patterns Spatio-temporal Model Transit Graph Weather
15 Missing & noisy data Flow of a region during Feb-May, 2014. Red arrows == missing Use probabilistic model Gaussian Markov Random Field
16 Gaussian Markov Random Field (GMRF) Vector x follows a multivariate Gaussian distribution Time series Mean Covariance Matrix Precision Matrix Markov properties Graph G captures conditional independence among xi Sparse G Sparse Q Fast learning with MCMC sampling to maximize a posteriori
Crowd flow = Seasonal + Trend + Residual 17 Seasonal model as a cyclic GMRF Smooth changes between: 1. Consecutive timestamps 2. First & last timestamps Gaussian s 1 s 7 s 2 s 3 s 6 s 5 s 4 Seasonal time series s with period F = 7
18 Crowd flow = Seasonal + Trend + Residual Trend model as a GMRF Gaussian Smooth changes between consecutive timestamps y 1 y 2 y 3 y 4 y 5 y 6 e. g. the new flow at 6 am of every Monday y 7
Crowd flow = Seasonal + Trend + Residual 19 Spatio-temporal residual model r Current Region R Next Region R’ Hour in day 1. . 24 Trip duration d Residual transit flow History of same region Regression Σ residual flow r Weather Day type Weekday? Weekend? Holidays? Transit tensor factorization (PARAFAC) Solved by counting Fast Day type 1 R R’ Day type 1 Day type 2 Day type 3
20 Experiment settings Please see full experimental results in the paper
More people bike when the weather is nice 21 R 9 R 8 80 R 6 R 7 End-flow Seasonal + Trend Temperature (o. F) 40 Apr. 21 Jun. 30 Sep. 22 R 5 R 3 R 1 5 am R 4 8 pm 5 am 8 pm Change of seasonal pattern in R 9 (Monday, sunny)
People don’t want to bike when it rains in NYC End-flow 22 FCCF 5 am 9 am 1 pm 5 pm 9 pm Weather
23 Occupy Wall Street (Sep. 17, 2014) R 9 R 8 R 2 R 6 R 7 6 R 2 8 10 6 8 10 New-flow R 5 R 1 R 3 R 1 R 4 R 7 R 5 R 4 True crowd flow Seasonal + Trend Our predictions 7 am 11 am End-flow 6 8 10
24 Minh Hoang mhoang@cs. ucsb. edu Code & data are available here:
- Slides: 24