Graph P Reducing Communication for PIMbased Graph Processing
Graph. P: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University of Southern California Stanford University
Outline • Motivation • Graph applications • Processing-In-Memory • The drawbacks of the current solution • Graph. P • Evaluation Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Graph Applications • Social network analytics • Recommendation system • Bioinformatics • … Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Challenges • High bandwidth requirement • Small amount of computation per vertex • Data movement overhead comp L 1 L 2 L 3 mem Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
PIM: Processing-In-Memory • Idea: Computation logic inside memory • Advantage: High memory bandwidth • Example: Hybrid Memory Cubes (HMC) mem …. . 320 GB/s intra-cube mem mem comp 4 x 120 GB/s inter-cube Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
HMC: Hybrid Memory Cubes 120 320 Intra-cube Inter-cube Bottleneck: Inter-cube communication Inter-group bandwidth (GB/s) Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Outline • Motivation • Graph applications • Processing-In-Memory • The drawbacks of the current solution • Graph. P • Evaluation Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Current Solution: Tesseract • First PIM-based graph processing architecture • Programming model • Vertex program • Partition • Based on vertex program Ahn, J. , Hong, S. , Yoo, S. , Mutlu, O. , & Choi, K. A scalable processing-in-memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42 nd Annual International Symposium on Computer Architecture (ISCA). Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Page. Rank in Vertex Program for (v: vertices) { update = 0. 85 * v. rank / v. out_degree; for (w: edges. destination) { put(w. id, function{ w. next_rank += update; }); } } barrier(); Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Graph Partition 0 3 1 1 4 hmc 0 2 0 1 2 3 4 5 5 vertex intra edge inter edge comm hmc 1 communication # of cross-cube edges }); put(w. id, function{ =w. next_rank += update; Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Drawback of Tesseract • Excessive data communication • Why? Programming Model Graph Partition Tesseract ? Data Communication Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Outline • Motivation • Graph. P • Evaluation Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Graph. P • Consider graph partition first. • Graph Partition • Source-Cut • Programming model • Two-phase vertex program • Reduces inter-cube communication Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Source-Cut Partition 0 1 0 2 1 2 hmc 0 3 1 5 4 intra edge vertex inter edge 2 2 replica hmc 1 3 4 Graph. P: A PIM-based Graph Processing Framework 5 ALCHEM alchem. usc. edu
Two-Phase Vertex Program for (r: replicas) { r. next_rank = 0. 85 * r. next_rank / r. out_degree; } //apply updates from previous iterations 2 3 4 Graph. P: A PIM-based Graph Processing Framework 5 ALCHEM alchem. usc. edu
Two-Phase Vertex Program for (v: vertices) { for (u: edges. sources) { update += u. rank; } 2 3 4 Graph. P: A PIM-based Graph Processing Framework 5 ALCHEM alchem. usc. edu
Two-Phase Vertex Program for (r: replicas) { put(r. id, function { r. next_rank = update}); } } 0 2 barrier(); 3 4 Graph. P: A PIM-based Graph Processing Framework 5 ALCHEM alchem. usc. edu
Benefits • Strictly less data communication • Enables architecture optimizations Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Less Communication 2 2 4 2 5 4 Tesseract 5 Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Broadcast Optimization for (r: replicas) { put(r. id, function { r. next_rank = update}); } } barrier(); broadcast 4 4 Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Naïve Broadcast • 15 point to point messages src dst dst Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Hierarchical communication • 3 intergroup messages src dst dst Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Other Optimizations • Computation/communication overlap • Leveraging low-power state of Ser. Des Please see the paper for more details Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Outline • Motivation • Graph. P • Evaluation Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Evaluation Methodology • Simulation Infrastructure • z. Sim with HMC support • ORION for NOC Energy modeling • Configurations • • Same as Tesseract 16 HMCs Interconnection: Dragonfly and Mesh 2 D 512 CPUs • Single-issue in-order cores • Frequency: 1 GHz Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Workloads • 4 graph algorithms • • Breadth First Search Single Source Shortest Path Weakly Connected Component Page. Rank • 5 real-world graphs • • • Wiki-Vote (WV) ego-Twitter (TT) Soc-Slashdot 0902 (SD) Amazon 0302 (AZ) ljournal-2008 (LJ) Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Performance 20 <1. 1 x Speedup 15 10 5 data partition 1. 7 x memory bandwidth 0 DDR 3 Tesseract SOTA Graph. P-SC Graph. P: A PIM-based Graph Processing Framework Graph. P-SC -BRD ALCHEM alchem. usc. edu
Communication Amount Normalized to Tesseract 100% 75% Intra-group Inter-group 51. 8% 50% 25% 48. 2% 7. 1% 7. 0% 0% Tesseract Graph. P-SC 0. 4% 1. 7% Graph. P-SC -BRD Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Energy consumption Normalized to Tesseract 100% 100. 0% 75% 50% 24. 9% 25% 15. 9% 0% Tesseract Graph. P-SC -BRD Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Other results • Bandwidth utilization • Scalability • Replication overhead Please see the paper for more details Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Conclusions • We propose Graph. P • A new PIM-based graph processing framework • Key contributions • Data partition as first-order design consideration • Source-cut partition • Two-phase vertex program • Enable additional architecture optimizations • Graph. P drastically reduces inter-cube communication and improves energy efficiency. Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Graph. P: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University of Southern California Stanford University
Workload Size & Capacity • 128 GB (16 * 8 GB) • ~16 billion edges • ~400 million edges (SNAP) • ~7 billion edges (Web. Graph) https: //snap. stanford. edu/data/ http: //law. di. unimi. it/datasets. php Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
Two-phase vertex program • Equivalent Expressiveness as vertex programs Graph. P: A PIM-based Graph Processing Framework ALCHEM alchem. usc. edu
- Slides: 34