Power Graph Distributed Graph Parallel Computation on Natural
Power. Graph: Distributed Graph. Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University; Carlos Guestrin, University of Washington
Current State 1. Many MLDM problems represented as graphs 2. Graph structured computation is important 3. Graphs are big 4. Current systems provide graph parallel computation – Pregel – Graph. Lab
Solution 1: Pregel Vertex Program
Solution 2: Graph. Lab Shared Distributed Graph
Problem Many graphs have skewed degree distribution Machine 1 Machine 3 Issue: Natural Graphs
What is a Natural Graph
Graph. Lab and Pregel on Natural Graphs • • Work Imbalance Random Partitioning Storage is linear in degree Expensive Communication
Solution Pregel Power. Graph Edge Cut Vertex Cut Replicate Edges Replicate Vertices Parallelize Vertex Program across all machines with that vertex
Balanced P-way Vertex Cut V Idea: Distribute edges while minimizing vertex replications V V
Distributing Edges: Random Idea: Randomly Assign Edges to Machines Why is this better than Pregel? Theorem: For a Given edge-cut with g ghosts, any vertex cut along the same partition boundary has fewer than g mirrors.
Distributing Edges: Greedy - Further minimize replication of vertices - Idea: Place next edge that minimizes vertex replication - Greedy Approaches - Coordinated - Oblivious
Edge Distribution
Implementations • Synchronous (Pregel) • Asynchronous and Serializable (Graph. Lab)
Discussion: Edge Placement and Run Time
Discussion: GAS Decomposition • Gather: collect information about surrounding vertices • Apply: Vertex updates value based on gathered data • Scatter: Vertex shares its new value with neighbors
What About Alpha? • Power. Graph is a solution to Natural Graphs • Can we do better if alpha is always around 2?
Fully Characterizing Natural Graphs Conclusions: - Out degree grows overtime, changing the value of alpha - Vertex diameters often decrease as a graph grows What does this mean when graphs are constantly changing in Power. Graph?
Takeaways • Vertex Cut implementation allows for greater parallelization of vertex programs and reduced replication of mirrors • GAS Decomposition is not fundamental to Power. Graph’s Implementation
- Slides: 18