Graph Indexing for ShortestPath Finding over Dynamic SubGraphs
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs Mohamed S. Hassan Aly Walid G. Aref Ahmed M. Purdue University – West Lafayette, IN, USA SIGMOD’ 16
Graphs are Everywhere 2 Road Network Social Network Biological Network Datacenter Network
Edge-Labeled Graph Model 3 Directed Graph 4, B 2 6, B 4 2, B 1 7 8, R 1, G 7, R 5 8 1, R 10, R 3 Edge Weight 9, R 6 9 Edge Label = Color
Querying Sub-Graphs 4 Graph Query Select Sub. Graph Select … From … Where…
Motivation 5 What is the shortest path between two persons considering only family relationships? Does Protein X interact with Protein Y through stable or covalent interactions?
Problem Definition 6 Edge-Constrained Shortest Path Query (ECSP) ECSP Query Q(s, d, A) using Labeled-Graph G � Given Source vertex s Destination vertex d Set of labels A ⊆ G. L � Find a shortest path from Vertex s to Vertex d using only edges labeled by labels of A
ECSP Query Example 7 ECSP Query Q(1, 6, {B, R}) 4, B 2 2, B 4 7 8, R 1, G 7, R 1 1, R 6, B 3 5 9, R 6 8 1, R 10, R 9
ECSP Query Example 8 ECSP Query Q(1, 6, {B, R}) Dashed Shortest Path with Cost 8 (Invalid) 4, B 2 2, B 7 8, R 1, G 7, R 1 1, R 6, B 4 3 5 9, R 6 8 1, R 10, R 9
ECSP Query Example 9 ECSP Query Q(1, 6, {B, R}) Dashed Shortest Path with Cost 9 (Valid Path) 4, B 2 4 7 8, R 1, G 2, B 7, R 1 1, R 6, B 3 5 9, R 6 8 1, R 10, R 9
Intuition 10 |L| << |E| (e. g. , 5, 32) Regular query-answer pattern � Consecutive monochrome edges 4, B 2 2, B 4 7 8, R 1, G 7, R 1 1, R 6, B 3 5 9, R 6 8 1, R 10, R 9
Query-Answer Regular Pattern 11 |L| << |E| (e. g. , 5, 32) Regular query-answer pattern � Consecutive monochrome edges Dashed Shortest Path with Cost 30 for Q(1, 9, {B, R}) 4, B 2 4 2, B 7 8, R 1, G 7, R 1 1, R 6, B 3 5 9, R 6 8 1, R 10, R 9
Query-Answer Regular Pattern 12 |L| << |E| (e. g. , 5, 32) Regular query-answer pattern � Consecutive monochrome edges 4, B 2 4 6, B 7 8, R 2, B 1 Blue Monochrome Shortest Path from Vertex 1 to Vertex 7 8 10, R 9 Precomputations (Index
Challenges 13 A query can select one of 2|L| possible sub-graphs to operate on Graph updates (e. g. , label, weight, new edge) � how to update the precomputations affected by updating the underlying graph?
Edge-Disjoint Partitioning (EDP) 14 Proposed solution: Edge-Disjoint Partitioning (EDP) = Partitioned Index + Traversal Algorithm
EDP Partitioning 15 Indexing Graph G to obtain Index I(G) � Partition based on edge labels 4, B 2 2, B 7 8, R 1, G 7, R 1 1, R 6, B 4 3 5 9, R 6 8 1, R 10, R 9
EDP Partitioning 16 Indexing Graph G to obtain Index I(G) � Partition based on edge labels (linear time) 4 2 2 4 6 1 1 4 7 5 Pr. G Pr B 1 2 1, R Pr. R 3 7, R 9, R 5 6 7 1, R 8 10, R 9
EDP Partitioning (Cont’d) 17 Key ideas… � Efficient pruning � Indexing monochrome shortest paths Index I(G) can cover all the queries
EDP Partitioning (Cont’d) 18 I(G) has the same connectivity as G � Bridge vertexes � Other. Hosts Lists 4 2 {R} 2 x {Label 1, Label 2, …} 4 {G} {B} 6 4 1 {R } Pr B {B 1} 1, R Pr. R 7 Pr. G {R } {B 2} 3 7, R 9, 1 5 6 7 1, R 5 {R} 8, R 8 10, R 9
EDP Partitioning (Cont’d) 19 Index is incremental and query-workload aware EDP allows the user to restrict the index growth � User defined maximum size (e. g. , 8 GB) � Index/Cache replacement policy (e. g. , LRU) 4 2 {R} 2 {B} 6 4 1 {R } Pr B 4 {G} {B 1} 1, R Pr. R 7 Pr. G {R } {B 2} 3 7, R 9, 1 5 6 7 1, R 5 {R} 8, R 8 10, R 9
Query Processing in EDP 20 Greedy traversal algorithm that uses a priority queue Each vertex is identified by (Partition Id, Vertex Id) A monochrome shortest path is computed only once (incremental indexing) 4 2 {R} 2 {B} 6 4 1 {R } Pr B 4 {G} {B 1} 1, R Pr. R 7 Pr. G {R } {B 2} 3 7, R 9, 1 5 6 7 1, R 5 {R} 8, R 8 10, R 9
Query Processing in EDP 21 Consider Query Q 1(1, 6, {R, B}) Start from a partition hosting the source node Check for the destination in the current partition Traverse other partitions through bridge vertexes 2 {R} 2 4 1 Pr B 1 12 7 {R} {B} {R} 2 {B} Pr. G 5 7 7, R 10, R Pr. R {G} Cost = 9 3 6 9 8
Query Processing in EDP (Cont’d) 22 Key ideas behind efficient query evaluation � Leveraging precomputed monochrome shortest paths � On-demand parallel computation of bridge edges
Graph Updates in EDP 23 Everything can be updated: � Topological Adding/removing a vertex Adding/removing an edge � Non-topological An edge weight can be updated An edge label can be updated How EDP handles graph updates?
Handling Graph Updates (Cont’d) 24 Key ideas � Lazy updates (for the precomputations) � Invalidate potentially affected pre-computations � Fix invalidated pre-computations on-demand 4 2 {R} 2 6 1 1, R {B} 4 7 3 7, R 9, 1 Pr. G {R} {B} 1 10, R 2 Pr. R {G} {R} Pr B 4 5 6 7 1, R 5 {R} 8, R 8 10, R 9
Handling Graph Updates (Cont’d) 25 Find the naturally formed disconnected components in each partition Use global clock 2 4 {R} 2 C 1 {R} 1 4 {G} {B} 6 {R} 7 4 {R} 5 1 Pr. G Pr B C 1 {B} 1 10, R 2 Pr. R C 2 1, R 3 7, R 9, 5 6 8, R 7 1, R 9 8 10, R C 1
Handling Graph Updates (Cont’d) 26 Each pre-computation has a timestamp (TS(Entry)) Each component in a partition has a timestamp (TS(C)) 2 {R} 2 4 C 1 {R} 1 4 {G} {B} 6 {R} 7 4 {R} 5 1 Pr. G Pr B C 1 {B} 1 10, R 2 Pr. R C 2 1, R 3 7, R 9, 5 6 8, R 7 1, R 9 8 10, R C 1
Handling Graph Updates (Cont’d) 27 On update: update the timestamp of the affected component (TS(C)) On query: re-compute Entry E iff TS(C) > TS(E) 2 {R} 2 4 C 1 {R} 1 4 {G} {B} 6 {R} 7 4 {R} 5 1 Pr. G Pr B C 1 {B} 1 10, R 2 Pr. R C 2 1, R 3 7, R 9, 5 6 8, R 7 1, R 9 8 10, R C 1
Experimental Results 28 Using six real edge-labeled graph datasets Comparing with CHLR [1] One to four orders-of-magnitude query-time speedup [1] M. N. Rice and V. J. Tsotras. Graph indexing of road networks for shortest path queries with label restrictions.
Average-Speedup of Query-Time 29
Index Size 30
Conclusions 31 EDP outperforms the state-of-the-art on static graphs and supports dynamic graphs EDP efficiently prunes disallowed edges Bridge edges are discovered in parallel to the main traversal thread The dynamic index of EDP is an incremental-index and query-workload aware On-demand re-computation of the invalidated index entries Index size can be controlled by the user Up to four orders-of-magnitude query-time speedup w. r. t. the state-of-the-art
32 Thank You!
Contraction Hierarchies 33
Handling Large Bridge Vertexes 34 Consider Q(S, D, {R, B}) Avoid adding all the bridge edges at once � � D 2 2 PB Define Max. Breadth parameter Explore Max. Breadth bridge edges at a time {G} 1 2 {…} {B }2 6 3 900 500 … PR {…} S 9940
Handling Large Bridge Vertexes 35 Consider Q(S, D, {R, B}) Avoid adding all the bridge edges at once � � D 2 2 PB Define Max. Breadth parameter Explore Max. Breadth bridge edges at a time {G} 1 2 {…} {B }2 6 3 900 500 … PR {…} S 9940
Handling Large Bridge Vertexes 36 Consider Q(S, D, {R, B}) Avoid adding all the bridge edges at once � � D 2 2 PB Define Max. Breadth parameter Explore Max. Breadth bridge edges at a time {G} 1 2 {…} {B }2 6 3 900 500 … PR {…} S 9940
Handling Large Bridge Vertexes 37 Consider Q(S, D, {R, B}) Avoid adding all the bridge edges at once � � D 2 2 PB Define Max. Breadth parameter Explore Max. Breadth bridge edges at a time {G} 1 {…} {B }2 3 {…} 900 500 2 6 … PR S 9940
Expensive Update Handling 38 Assume that (1⇝ 9) was computed at TS = 10 When to re-compute (1⇝ 9) a query asks for (1⇝ 9), and � the log of Partition Pr. R has updates with TS > 10, and � an edge not in (1⇝ 9) has a decreased weight � or, an edge in (1⇝ 9) has an increased weight � When 22 {B} 25 1 Pr. R 7 1 3 9 6 5 8 9 5 8 10 3
Expensive Update Handling (Cont’d) 39 Assume that Bridge. Edges(1) was computed at TS = 10 When to re-compute Bridge. Edges(1) � When a query asks for Bridge. Edges(1), and � the log of Partition Pr. R has updates with TS > 10, and � an Edge (u, v) has a decreased/increased weight, and � Vertex u is reachable from Vertex 1, and � Vertex v can reach a bridge vertex 4 2 2 Pr B 4 6 1 {R} {G} 6 7 8 {R}
Query Processing in EDP (Cont’d) 40 Check potential shorter paths in other allowed partitions through bridge edges (computed in parallel using lazy evaluation) Consider Query Q 2(1, 6, {R, B}) 2 {R} 4 2 6 1 {R } Pr B 4 {G} {B 1} 1, R Pr. R 7 Pr. G {R } {B 2} 3 7, R 9, 5 6 7 1, R 8 10, R 9
Query Processing in EDP (Cont’d) 41 Processing Query Q 1(1, 6, {R}) Can start from PR(1) or PB(1) PQ: {(PR(1), 0)} 2 {R} 4 2 6 1 {R } Pr B 4 {G} {B 1} 1, R Pr. R 7 Pr. G {R } {B 2} 3 7, R 9, 5 6 7 1, R 8 10, R 9
Query Processing in EDP (Cont’d) 42 Processing Query Q 1(1, 6, {R}) PQ: {(PR(1), 0)} {(PB(1), 0), (PR(6), 10)} {R} 2 4 {G} 1 Pr B 7 {R} 1 {B} {R} 2 {B} 5 Pr. G 7 10, R Pr. R 3 6 9 8
Query Processing in EDP (Cont’d) 43 Processing Query Q 1(1, 6, {R}) PQ: {(PR(1), 0)} {(PB(1), 0), (PR(6), 10)} {(PR(2), (PR(6), 10), (PR(7), 12)} 2 {R} 2 4 1 Pr B {G} 12 {R} 1 {B} 7 {R} 2 {B} 5 Pr. G 7 10, R Pr. R 3 6 9 8
Query Processing in EDP (Cont’d) 44 Processing Query Q 1(1, 6, {R}) PQ: {(PR(1), 0)} {(PB(1), 0), (PR(6), 10)} {(PR(2), (PR(6), 10), (PR(7), 12)} 2 {R} 2 4 1 Pr B {G} 12 {R} 1 {B} 7 {R} 2 {B} 5 Pr. G 7 10, R Pr. R 3 6 9 8
Query Processing in EDP (Cont’d) 45 Processing Query Q 1(1, 6, {R}) PQ: {(PR(1), 0)} {(PB(1), 0), (PR(6), 10)} {(PR(2), (PR(6), 10), (PR(7), 12)} {(PR(6), 9), (PR(6), 10), (PR(7), 12)} 2 {R} 2 4 1 Pr B {G} 12 {R} 1 {B} 7 {R} 2 {B} 7, R 5 Pr. G 7 10, R Pr. R 3 6 9 8
Query Processing in EDP (Cont’d) 46 Processing Query Q 1(1, 6, {R}) PQ: {(PR(1), 0)} {(PB(1), 0), (PR(6), 10)} {(PR(2), (PR(6), 10), (PR(7), 12)} {(PR(6), 9), (PR(6), 10), (PR(7), 12)} Cost = 9 2 {R} 2 4 1 Pr B {G} 12 {R} 1 {B} 7 {R} 2 {B} 7, R 5 Pr. G 7 10, R Pr. R 3 6 9 8
47 Handling Large Bridge Vertexes (Cont’d) Breadth factor parameter (see details in paper of how it is set) Consider Query Q(S, D, {R, B}) with Breadth. Factor = 2 Bridge edges are discovered by G a thread running Dijkstra’s Algo. 1 Sorted by cost 2 New attribute in a PQ element PQ: {(PR(S), 0, 0)} Next edge to PR D 2 2 PB B 3 2 6 900 500 … S … 9940
48 Handling Large Bridge Vertexes (Cont’d) Processing Q(S, D, {R, B}) with Breadth. Factor = 2 PQ: {(PR(S), 0, 0)} D 2 2 PB G B 1 3 2 2 PR 6 900 500 … S … 9940
49 Handling Large Bridge Vertexes (Cont’d) Processing Q(S, D, {R, B}) D with Breadth. Factor = 2 PB PQ: {(PR(S), 0, 0)} {(PR(1), 2, 0), (PR(2), 6, 0), (PR(S), 500, 3)} G 2 2 B 1 3 2 2 PR 6 900 500 … S … 9940
50 Handling Large Bridge Vertexes (Cont’d) Processing Q(S, D, {R, B}) D with Breadth. Factor = 2 PB PQ: {(PR(S), 0, 0)} {(PR(1), 2, 0), (PR(2), 6, 0), (PR(S), 500, 3)} {(PB(2), 6, 0), (PR(S), 500, 3)} G B 1 2 2 3 2 2 PR 6 900 500 … S … 9940
51 Handling Large Bridge Vertexes (Cont’d) Processing Q(S, D, {R, B}) D 2 with Breadth. Factor = 2 PB PQ: {(PR(S), 0, 0)} {(PR(1), 2, 0), (PR(2), 6, 0), (PR(S), 500, 3)} {(PB(2), 6, 0), (PR(S), 500, 3)} G B 3 {(PB(D), 8, 0), (PR(S), 500, 3)} 1 2 Destination Reached with 500 6 2 SP distance = 8 … … PR S 2 900 9940
Future Work 52 Support non-categorical attributes (e. g. , latency in a communication network) Optimize for other graph queries (e. g. , reachability) Extend a relational engine to support graphs natively � Extend the query language (declarative and procedural) � Introduce primitive graph operators � Seamless pipelining of graph and relational operators in the same query execution plan
- Slides: 52