Data Structure for Dynamic Graphs in Distributed Clusters
Data Structure for Dynamic Graphs in Distributed Clusters Mohammad Nokhbeh Zaeem School of Computer Science Carleton University, Ottawa, Canada Mohammad. Nokhbeh. Zaeem@cmail. Carleton. ca COMP 5704 Project Presentation 1
Problem Definition and Notation(1) • G=(V, E) • V: set of vertices, E: set of pairs of connecting vertices • Represent: Social Networks, Dynamic Systems, Influence, Roads, Electrical Circuits, … • Questions: Connectivity, Distance, MST, Min. Cut, Routing, Maximum Flow, … • Computationally costly COMP 5704 Project Presentation 2
Problem Definition and Notation(2) • Dynamic Graph: Graphs that change over time. • Addition, Deletion, Movement, … • Same problems • Why can’t we wait till we get a static graph? COMP 5704 Project Presentation 3
Example Problems • Disaster Detection • Trend Detection • News Networks • Circuit Error Control, Maximum Flow • Routing Problems • Suggestion systems COMP 5704 Project Presentation 4
Requirements • A dynamic algorithm • An algorithm that maintains the solution • A dynamic graph data structure • Easy insert, delete, lookup • Easy parallelism COMP 5704 Project Presentation 5
Acceptable Dynamic Graph Data Structure for parallelism • High parallelism • To keep up with changes • Multicore out of memory vs. distributed in memory • Speed up vs. communication complexity • Hard disk cannot approach real time COMP 5704 Project Presentation 6
Existing Solutions • Multicore: • • STINGER LLAMA Aspen Graph. Tinker • Cuda: • cu. STINGER, … • Distributed: • DISTINGER • Why only one distributed solution? COMP 5704 Project Presentation 7
STINGER(1) • Data Structure: • LVA • Blocks: • Blocks of edges • Edge Type array COMP 5704 Project Presentation 8
STINGER(2) • Source of parallelism 1. Single update 2. Sorted batch update 3. Unsorted batch update with locks COMP 5704 Project Presentation 9
DISTINGER(1) • Master Slave Scheme • Distribute vertices based on a hash function between slaves • Blocks • The same as STINGER • Edge list • Same as STINGER COMP 5704 Project Presentation 10
DISTINGER(2) • Source of parallelism: • Cluster Parallelism • In Node Parallelism • The interface is the same as STINGER COMP 5704 Project Presentation 11
Implementation • Reimplement local STINGER • Batch update • Sort with hashed values to associate updates with nodes • Test with Dyno. Graph Benchmark COMP 5704 Project Presentation 12
Limitations and bottlenecks • Master Slave Scheme: • All information should pass through master • Unbalanced work (unpredictable) • Since the hash function is used by each slave it cannot be dynamic • Interfaced to work with STINGER • Limitations of this data structure is different so the algorithms must be different COMP 5704 Project Presentation 13
Results Speed up test COMP 5704 Project Presentation 14
Results Batch Size effect on average update time COMP 5704 Project Presentation 15
How to improve • Algorithms • Adapt algorithms to distributed systems • Distributed Control • Distribute the control COMP 5704 Project Presentation 16
References G. Feng, X. Meng, and K. Ammar, “DISTINGER: A distributed graph data structure for massive dynamic graph processing, ” Proc. - 2015 IEEE Int. Conf. Big Data, IEEE Big Data 2015, pp. 1814– 1822, 2015. D. A. Bader, A. Amos-binks, J. Berry, D. Chavarr, C. Hastings, and S. C. Poulos, “STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation, ” Cass. Mt. Pnnl. Gov, pp. 1– 7, 2009. D. Ediger, R. Mc. Coll, J. Riedy, and D. A. Bader, “STINGER: High performance data structure for streaming graphs, ” 2012 IEEE Conf. High Perform. Extrem. Comput. HPEC 2012, 2012. COMP 5704 Project Presentation 17
Q&A 1. Why dynamic graphs are important? 2. What are the applications? 3. How does STINGER maintain edges? COMP 5704 Project Presentation 18
- Slides: 18