Traffic Prediction in a Bike Sharing System Yexin
Traffic Prediction in a Bike. Sharing System Yexin Li, Yu Zheng, Huichu Zhang, Lei Chen The Hong Kong University of Science and Technology Microsoft Research, Beijing, China
Bike-sharing systems are widely available q Bike-Sharing System q Current Problem Ø Skewed distributions of Bike Usage Ø Check out a bike Ø Ride to destination Ø Check in the bike 40 20 0 Origin station Check out a bike 1 4 7 1013161922 Spatial distribution Temporal distribution No bikes No docks Ride Destination station Check in the bike
An Idea Solution q Predict bike usages at each station q Reallocate bikes by trucks Bike usage is chaotic at an individual station ! 10 5 S 1 10 5 S 2 1 st 4 th 7 th 10 th 13 th 16 th 19 th 22 th 25 th 28 th 31 th 8 am 9 am 10 am 11 am
A Practical Solution q Our solution Ø Cluster stations into groups Ø Predict bike usage of each station cluster Ø Reallocate bike between station clusters q Observations 7 -8 am C 1 day Transition Var. Check-out Ø Bike usage of a cluster is more predictable. Ø Inter-cluster transition is more stable. hour q Prediction for each station is unnecessary Ø Users check out/in bikes at a random station Ø Events affect an area instead of a station 8 am 9 am 10 am
Challenges q Cluster definition Larger check-out at A Larger check-in at B Ø Features considered when clustering q Impacted by multiple factors Ø Meteorology Ø Correlation between clusters Ø Events B A Correlation between clusters q Data imbalance Ø # Sunny hours >> # Rainy hours Ø (11. 7, 4. 6 mph) never happened in NYC, during 01/4 -31/9, 2014 Weather distribution Temperature & Wind Speed sample
Framework of Our Solution q Check-out Predict check- Predict bike usage out proportion of the entire city Bipartite station clustering 0. 1 … … 0. 2 Hierarchical Prediction q Check-in Check-out Probability & Expectation Learning Transition matrix Trip duration Check-in Inference
Motivation of Bipartite Station Clustering q Stations in one cluster should be closed to each other q Stations in one cluster should perform similarly Ø Stable inter-cluster transition Ø Stable check-out proportion C 1 C 5 C 4 C 2 C 3 Less stable More stable
Bipartite Station Clustering Geo-clustering q Procedure Ø Geo-clustering: K 1 Clusters Ø T-matrix generation Ø T-clustering: K 2 Clusters q T-matrix Generation … … T-clustering t 1 t 2 t 7
Motivation of Hierarchical Prediction q Bike usage in the entire city is more regular Ø can be predicted more accurately. q Bound the total prediction error in the lower level Predict checkout proportion Predict bike usage of the entire city 0. 1 … … day Entire Traffic 0. 2 Hierarchical Prediction day Check-out of a cluster
Bike Usage of the Entire City q Solution Gradient Boosting Regression Tree, i. e. , GBRT Features Extraction q Day q Hour q Weather q Temperature q Wind speed 13 th , Aug. Rainy Temperature keeps increasing 25 th , Sep. Windy
Check-out Proportion Prediction … … W(�� , �� ) = �� , �� ) × �� ((�� , �� ), (�� , �� )) �� �� 1(�� 2(�� �� q Weather foggy q Time 1 foggy q Temperature & Wind speed
Transition Matrix & Trip Duration 0. 65 0. 01 Ø Transition Probability. The probability that a bike will be checked in to cluster �� given �� it is checked out from �� in time ��. �� 0. 05 0. 1 C 2 0. 5 0. 39 0. 1 C 3 0. 6 0. 15 0. 05 0. 29 0. 01 Ø Using a log-normal distribution to fit 0. 05 0. 15 0. 88 C 4 0. 02
Check-in Inference q Check-out Expectation of on-road bikes to each cluster q Check-in C 1 0. 4 C 1 0. 5 0. 1 0. 3 Bikes on road 2 C 2 C 3 C 4 C 2 C 4 C 3 0. 1 0. 2 0. 3 2 Bikes will be borrowed 2
Experiments q Datasets Ø Ø Citi-Bike Data in New York City Meteorology Data in New York City Capital Bikeshare in Washington D. C. Meteorology Data in Washington D. C. q Metric Ø Error Rate Data Released: http: //research. microsoft. com/apps/pubs/? id=255961
Experiments Check-out q Accuracy improvement >0. 03 for all hours >0. 18 for anomalous hours Anomalous Hours Methods GC BC GC BC HA 0. 353 0. 355 1. 964 1. 968 ARMA 0. 346 2. 276 2. 273 GBRT 0. 311 0. 314 0. 696 0. 683 HP-KNN 0. 298 0. 299 0. 692 0. 685 HP-MSI 0. 288 0. 282 0. 637 0. 503 Check-in Clustering Results All Hours Anomalous Hours Methods GC BC HA 0. 347 0. 352 1. 837 1. 835 ARMA 0. 340 0. 344 2. 152 2. 143 GBRT 0. 309 0. 681 0. 671 HP-KNN 0. 302 0. 295 0. 694 0. 684 HP-MSI 0. 297 0. 290 0. 642 0. 506 P-TD 0. 335 0. 302 0. 498 0. 445
Conclusions q Bipartite station clustering Ø Cluster stations based on locations and transitions q Hierarchical prediction improves the accuracy Ø Bound the total error in the lower level Ø >0. 03 improvement for all hours q Multi-similarity-based model Ø Deal with data imbalance Ø >0. 18 improvement for anomalous hours
Thanks ! Contact: Dr. Yu Zheng yuzheng@Microsoft. com Released Data: http: //research. microsoft. com/apps/pubs/? id=255961
- Slides: 17