TopologyAware Distributed Graph Processing for Tightly Coupled Clusters
Topology-Aware Distributed Graph Processing for Tightly -Coupled Clusters Mayank Bhatt, Jayasi Mehar DPRG: http: //dprg. cs. uiuc. edu
Our work explores the problem of graph partitioning, focused on reducing the communication cost on tightly coupled clusters 2
Why? • Experimenting with cloud frameworks on HPC systems • Interest in supercomputing as a service • More big data jobs running on supercomputers 3
Tightly-Coupled Clusters • Supercomputers • Compute nodes embedded inside the network topology • Messages routed via compute nodes • Communication patterns can influence performance • “Hop count” is an approximate measure of cost of communication 4
Blue Waters Interconnect • 3 D Torus • Subset of nodes returned for running job • Static routing - number of hops between two nodes will remain constant 5
Graph Processing Systems • Lot of real world data is expressed in the form of graphs • Billion of vertices, trillions of edges, need to distribute • Algorithms - ex. Shortest path, Page. Rank • 2 stages - Ingress and Processing 7
Types of Partitioning Vertex Cuts Edge Cuts 8 • System of choice: Power. Graph • Masters and Mirrors • Masters communicate with all mirrors • Our hypothesis: placing masters and mirrors close by should reduce communication cost
Master mirror placement • • Place replicas of a vertex first and then decide where to place the master R M Place the master of each vertex first and then decide where to place the replica Hashing R R M 9 R
Random Partitioning M • Fast ingress • Communication cost between master and mirrors can be high • Replication factor could be high R R 11
Oblivious Partitioning M R 12 • Slower ingress • Heuristic based partitioning • Leads to smaller replication factor than random • Starting point to optimize Master mirror communication
Grid Partitioning 13 • Intersecting constraint sets • Leads to a controlled replication factor • Master mirror communication not optimized
Topology Aware Variants • Make the partitioning step aware of the underlying network topology • Place masters and mirrors such that communication cost is minimized 14
Choosing a master • Pick master such that total number of hops are minimum • Geometric centroid • Edge degrees of each replica can be different • Weighted Centroid 15
Grid Centroid Edges are placed using the Grid partitioning Strategy first Number of hops between mirror and candidate Number of edges on mirror 16 Load: number of masters on candidate
Restricted Oblivious 18
Restricted Oblivious Maximum number of edges on a node Number of edges on candidate Minimum number of edges on a node Number of hops between candidate and master 19
Experiments • Cluster size: 36 nodes • Algorithm: Approximate diameter • Graph: Power-law, 20 million vertices 20
Runtime and Ingress Tradeoff between runtime and ingress 21
Graph Algorithms Data intensive algorithms benefit more 22
Graph Type Improvements depend on type of graph 23
Network Data Transfer 24
Other System Optimizations • Controlling the frequency of data injection into network impacts runtime in certain algorithms • Smaller network buffers => flushed more frequently 25
Buffer Sizes Page. Rank Approximate Diameter Small computation and network data benefit from frequent flushing 26
Decisions, decisions 27
Conclusions • Two new topology-aware algorithms for graph partitioning • No ‘one size fits all’ approach to graph partitioning • We propose a decision tree that can help decide which partitioning algorithm is best • System optimizations complement performance DPRG: http: //dprg. cs. uiuc. edu 28
Questions and Feedback? DPRG: http: //dprg. cs. uiuc. edu 29
- Slides: 26