Fast, Exact Graph Diameter Computation with Vertex Programming Vertex-Centric Computing for Large Scale Graph Analytics Corey Pennycuff and Tim Weninger SIGKDD Workshop on High Performance Graph Mining August 10, 2015
Dijkstra’s Single Source Shortest Path B F A 2 C E 0 G D A A B C 0 1 1 D E F G 1 2 2
Medium Graphs 4 million nodes 200 million edges
Bigger Graphs DISK Solution – Hadoop DISK data DISK 2 DISK mappers DISK shuffle and sort DISK 3 reducers DISK result 4 DISK
Graph Diameter • HADI Reverse Cuthill-Mc. Kee Random BFS
Bulk Synchronous Parallel (BSP) Created in 1990 by Les Valiant and Bill Mc. Coll at Oxford DISK data DISK barrier Superstep 0 Superstep 1 Data kept in memory Superstep 2 Superstep 3 result
Graph Analytics with BSP Require the programmer to “think like a vertex” A B F E C D …
The Vertex Each Vertex Can: • • • Receive messages from previous superstep Modify its value/datum Send messages
BSP Single Source Shortest Path B E A F C G D compute(Message. Iterator* msgs){ bool changed = false; foreach(msg : msgs){ if(msg < datum){ datum = msg; changed = true; } } if(changed) { foreach(edge : Get. Out. Edge. Iterator()){ send. Message. To(edge. dest, datum + edge. weight) } }else{ vote. To. Halt(); } }
Dijkstra’s Single Source Shortest Path master B E A F 0 C G D A A Superstep 0 0 B C D E F G
Dijkstra’s Single Source Shortest Path B E A F 0 C G D A Superstep 1 A B C 0 1 1 D E F 2 G
Dijkstra’s Single Source Shortest Path B E A F 0 C G D A Superstep 2 A B C 0 1 1 D E F 2 G
Supersteps-1 = Node Eccenctricity B E A F 0 C G D A A B C 0 1 1 D E F 2 G
Diameter Measurement E A F C G D C C C G D B E A F D G D E A F B B B A F D E G C G D B E A F C D G
Limitations Must be synchronous Designed for unweighted graphs