Adaptive Dynamic Bipartite Graph Matching A Reinforcement Learning

Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach Yansheng Wang 1, Yongxin Tong 1, Cheng Long 2, Pan Xu 3, Ke Xu 1, Weifeng Lv 1 1 Beihang University 2 Nanyang Technological University 3 University of Maryland, College Park

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 2

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 3

Background l 4 Bipartite graph matching 1 2 3 4 1 2 2 Solved by the Hungarian method in polynomial time 5 6 l Harold W. Kuhn Traditional applications l Assignment problem, vehicle scheduling problem, etc. l Perform well in offline scenarios

Background l 5 Emergence of online scenarios Transportation Taxi-hailing, Ride sharing, … Medical Mutual blood donation, Kidney exchange, … Economic Two-sided market, Crowdsourcing, … Online matching is more and more important

Existing Research l 6 Problem model: online bipartite matching 1 Objective: maximize the sum of weights 2 3 4 2 Nodes arrive and leave dynamically 1 Match as soon as the node arrives (instantaneous constraint) 2 5 6 l Solution: online algorithms under instantaneous constraint Y. Tong et al, Online mobile microtask allocation in spatial crowdsourcing. In ICDE 2016.

Motivation l 7 Instantaneous constraint is too strong sometimes Accuracy: 70% Waiting for response 2 drivers nearby have received your order Accuracy: 95% Passengers can wait for a short time before being served 17: 00 Requseters are willing to wait for more reliable workers l 17: 30 I have a task of labeling 500 pictures If nodes can wait (match in a batch manner) l More information can be gathered l Likely to meet better candidates in the future L. Kazemi et al. Geocrowd: enabling query answering with spatial crowdsourcing. In GIS 2012.

Limitation of existing work l Strong assumptions: instantaneous constraint l Batch manner: fixed batch and lacking in global 8 theoretical guarantee Y. Tong et al, Online mobile microtask allocation in spatial crowdsourcing. In ICDE 2016. Y. Tong et al, Flexible dynamic task assignment in real-time spatial data. In VLDB 2017. P. Cheng et al, An experimental evaluation of task assignment in spatial crowdsourcing. In VLDB 2018 L. Kazemi et al. Geocrowd: enabling query answering with spatial crowdsourcing. In GIS 2012. L. Kazemi et al. Geo. Tru. Crowd: trustworthy query answering with spatial crowdsourcing. In GIS 2013.

Contribution l Devise a novel adaptive batch-based framework l Analyze the global theoretical guarantee l Propose an effective and efficient reinforcement learning based solution 9

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 10

Problem Statement l Dynamic bipartite graph 1 2 3 4 4 1 2 6 3 2 5 Arrival time 11

Problem Statement l Dynamic bipartite graph 1 2 3 4 4 1 2 6 3 2 5 Arrival time 12

Problem Statement l 13 Dynamic bipartite graph Duration of nodes: Node 3 will vanish at time step 6 1 6 2 3 3 1 4 4 Lifetime of node 3 1 2 4 6 3 2 5 5 2 Arrival time

Problem Statement l Dynamic bipartite graph 1 6 2 3 3 1 4 3 2 5 4 1 2 5 2 4 6 Arrival time 14

Problem Statement Decisions can be made freely. (Without assumption of instantaneous constraint) 15

Problem Statement Theoretical guarantee of an online algorithm in worst case Offline optimum 16

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 17

Our framework l 18 We propose an adaptive batch-based framework Cumulate the dynamically coming nodes into a batch 1 6 2 3 3 1 4 Adaptively adjust the size of batch 3 2 5 Match all the nodes in the batch 4 1 2 5 2 4 6 Time of arrival

Our framework l 19 We propose an adaptive batch-based framework 1 6 Cumulate the dynamically coming Challenge 1: How nodes into a batch Match all the nodes in the batch optimal 2 2 is 5 an adaptive batch 3 based framework in theory? 3 3 4 1 4 Challenge 2: How 1 to implement an optimal 5 2 adaptively? strategy to split 2 batches Adaptively adjust the size of batch 4 6 Time of arrival

Solution to the 1 st Challenge l Hungarian algorithm as the in-batch algorithm 1 6 2 3 3 3 1 4 2 5 4 1 2 5 2 4 6 0 1 1 0 2 0 3 0 4 0 5 1 6 Time of arrival 1 We use the Hungarian algorithm in each batch 20

Solution to the 1 st Challenge 21 We answer the open question: Whether a batch-based solution can achieve a global theoretical guarantee?

Solution to the 2 nd Challenge l Observation 1 6 2 3 3 1 4 4 2 5 . . . 1 2 4 6 3 ? A sequential decision making problem ? 5 2 Modeled by a Markov decision process (MDP) 22

Solution to the 2 nd Challenge 23

Solution to the 2 nd Challenge l 24 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 25 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 26 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 2 5 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 27 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 2 3 3 3 2 5 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 28 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 2 3 3 1 4 3 4 2 5 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 29 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 2 3 3 1 4 3 2 5 4 1 5 2 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge l 30 Baseline: Q-learning l an example of inferring (with learned parameters) Learned Q-table 1 6 2 3 3 1 4 2 5 4 1 2 4 6 3 5 2 State a=0 a=1 (0, 0) 0. 97 0. 02 (0, 1) 0. 85 0. 59 (1, 0) 0. 82 0. 45 (1, 1) 0. 66 0. 71 (0, 2) 0. 88 0. 53 (2, 0) 0. 79 0. 44 (1, 2) 0. 71 0. 61 (2, 1) 0. 65 0. 59 (2, 2) 0. 42 0. 53 … … …

Solution to the 2 nd Challenge Bellman backups The Q-learning algorithm converges to optimum given sufficient training data Watkins C. J. , Dayan P. Q-learning. Machine learning, 1992. 31

Solution to the 2 nd Challenge l 32 Observation 1 2 1 ? 2 ? 3 Searching space of? general Q-learning 3 is too large, leading to inefficiency problem 4 4 ? 5 5 6 6 Some checks are unnecessary with a too small batch size Many nodes will vanish with a too large batch size

Solution to the 2 nd Challenge 33

Solution to the 2 nd Challenge 1 6 34 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 5 35 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 3 2 5 36 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 3 2 5 37 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 1 4 3 4 2 5 38 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 3 2 5 4 1 4 Better split now! 39 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 1 4 3 2 5 4 1 5 3 40 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 1 4 4 1 2 4 6 3 2 5 5 3 41 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 1 4 4 1 2 4 6 1 7 3 2 5 5 5 3 42 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Solution to the 2 nd Challenge 1 6 2 3 3 1 4 4 1 2 4 6 3 2 5 5 3 5 1 7 Better split now! 43 Learned Q-table State a=3 a=4 a=5 … … (1, 2, 3 ) 0. 59 0. 62 0. 55 (1, 2, 4 ) 0. 0 0. 51 0. 47 (1, 2, 5 ) 0. 0 0. 56 0. 44 (2, 1, 3 ) 0. 37 0. 29 0. 12 (2, 2, 4 ) 0. 0 0. 58 0. 55 (2, 2, 5 ) 0. 0 0. 41

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 44

Experiments l Datasets l l l Synthetic data varying o Distribution of edge weights o Graph sparsity o Duration of nodes o Arriving density of nodes o Cardinality o Scalability Real data from Didi chuxing o about 10 K nodes per hour o 400 K for training and 10 K for testing Compared methods l Greedy algorithm (GR) l TGOA from ICDE 16 l Fixed-batch algorithm (FB) 45

Experiments l 46 Impact of edge distribution RQL is the most effective RQL is efficient in running time The memory cost is not high (about 23 MB)

Experiments l 47 Impact of sparsity and arriving density Varying the graph sparsity Varying the arriving density of the nodes

Experiments l Results on real data from Didi chuxing Varying the maximal duration of tasks/workers 48

Outline l Background and Motivation l Problem Statement l Our Solutions l Experiments l Conclusion 49

Conclusion l 50 Propose a novel adaptive batch-based framework that guarantees a constant competitive ratio l Devise effective and efficient RL-based solutions to learn how to split the batches adaptively l Extensive experiments on both real and synthetic datasets show that our solution outperforms the state-of -the-arts.

51 Q&A Thank You

Solution to the 1 st Challenge Unmatched nodes remain in the batch Unmatched nodes expire 52

Solution framework 53

Solution to the 2 nd Challenge l Problem: it is memory consuming l Optimization: quantization techniques State a=2 a=3 a=4 a=5 … … … (10, 2 ) 0. 45 0. 46 0. 48 0. 49 (10, 11, 2 ) 0. 45 0. 49 0. 50 (11, 10, 2 ) 0. 44 0. 46 0. 49 (11, 2 ) 0. 46 0. 45 0. 48 54

Experiments l Impact of quantization Utility score is not damaged Even better in some cases Memory cost is largely decreased and remains stable 55

Spontaneous v. s. Batch-based 1 1 2 3 4 3 2 4 3 4 1 2 2 5 6 The total utility is 3 + 2 = 5 2 3 4 1 2 5 6 The total utility is 2 + 4 + 2 = 8 56

Problem Statement l Dynamic bipartite graph l Online scenario 2 3 3 1 4 t=1 2 5 t=2 t=3 4 t=4 1 2 4 6 3 1 6 5 2 t=5 t=6 Arrival time 57