Distributed Systems CS 15 440 Programming Models Part

Distributed Systems CS 15 -440 Programming Models- Part V Lecture 19, Nov 18, 2015 Mohammad Hammoud 1

Today… § Last Session: § Programming Models – Part IV: Map. Reduce & Pregel (Intro) § Today’s Session: § Programming Models – Part V: Pregel (Continue) & Graph. Lab § Announcements: § PS 4 is due tomorrow by midnight § P 4 will be posted by tonight. It is due on Dec 3 rd § We will practice on Map. Reduce tomorrow during the recitation 2

Objectives Discussion on Programming Models Why parallelizing our programs? Parallel computer architectures Traditional Models of parallel programming Over 4 Sessions Types of Parallel Programs Message Passing Interface (MPI) Map. Reduce, Pregel and Graph. Lab Cont’d

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow 4 Fault. Tolerance

Motivation for Pregel § How to implement algorithms to process Big Graphs? § Create a custom distributed Difficult! infrastructure for each new algorithm Inefficient and Cumbersome! § Rely on existing distributed analytics engines like Map. Reduce § Use a single-computer graph algorithm library like BGL, LEDA, Big Graphs might be too large to fit on a single machine! Network. X etc. § Use a parallel system. Distributed like Parallel. Systems! BGL or CGMGraph Not graph suitedprocessing for Large-Scale 5

What is Pregel? § Pregel is a large-scale graph-parallel distributed analytics engine § Some Characteristics: • • In-Memory (opposite to Map. Reduce) High scalability Automatic fault-tolerance Flexibility in expressing graph algorithms Message-Passing programming model Tree-style, master-slave architecture Synchronous § Pregel is inspired by Valiant’s Bulk Synchronous Parallel (BSP) model 6

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow 7 Fault. Tolerance

The BSP Model Iterations Data CPU 1 Data Data CPU 2 Data Data CPU 3 Data Data Super-Step 1 Super-Step 2 8 Super-Step 3 Barrier Data

Entities and Super-Steps § The computation is described in terms of vertices, edges and a sequence of super-steps § You give Pregel a directed graph consisting of vertices and edges § § Each vertex is associated with a modifiable user-defined value Each edge is associated with a source vertex, value and a destination vertex § During a super-step: § § A user-defined function F is executed at each vertex V F can read messages sent to V in superset S – 1 and send messages to other vertices that will be received at superset S + 1 F can modify the state of V and its outgoing edges 9 F can alter the topology of the graph

Topology Mutations § The graph structure can be modified during any super-step § Vertices and edges can be added or deleted § Mutating graphs can create conflicting requests where multiple vertices at a super-step might try to alter the same edge/vertex § Conflicts are avoided using partial ordering and handlers § Partial orderings: § § Edges are removed before vertices Vertices are added before edges Mutations performed at super-step S are only effective at super-step S + 1 All mutations precede calls to actual computations § Handlers: § Among multiple conflicting requests, one request is selected arbitrarily 10

Algorithm Termination § Algorithm termination is based on every vertex voting to halt § In super-step 0, every vertex is active § All active vertices participate in the computation of any given super-step § A vertex deactivates itself by voting Vote to Halt to halt and enters an inactive state § A vertex can return to active state Active Inactive if it receives an external message Message Received Vertex State Machine § A Pregel program terminates when all vertices are simultaneously inactive and there are no messages in transit 11

Finding the Max Value in a Graph S: 3 6 2 1 Blue Arrows are messages S + 1: 3 6 6 2 1 6 Blue vertices have voted to halt S + 2: 6 6 2 6 6 S + 3: 6 6

The Programming Model § Pregel adopts the message-passing programming model § Messages can be passed any other vertex in the graph § § § from any vertex to Any number of messages can be passed The message order is not guaranteed Messages will not be duplicated § Combiners can be used to reduce the number of messages passed between super-steps § Aggregators are available for reduction operations (e. g. , sum, min, and max) 13

The Pregel API in C++ § A Pregel program is written by sub-classing the Vertex class: template <typename Vertex. Value, typename Edge. Value, typename Message. Value> To define the types for vertices, edges and messages class Vertex { public: virtual void Compute(Message. Iterator* msgs) = 0; const string& vertex_id() const; int 64 superstep() const; const Vertex. Value& Get. Value(); Vertex. Value* Mutable. Value(); Out. Edge. Iterator Get. Out. Edge. Iterator(); Override the compute function to define the computation at each superstep To get the value of the current vertex To modify the value of the vertex void Send. Message. To(const string& dest_vertex, const Message. Value& message); To pass messages to other vertices void Vote. To. Halt(); }; 14

Pregel Code for Finding the Max Value Class Max. Find. Vertex : public Vertex<double, void, double> { public: virtual void Compute(Message. Iterator* msgs) { int curr. Max = Get. Value(); Send. Message. To. All. Neighbors(curr. Max); for ( ; !msgs->Done(); msgs->Next()) { if (msgs->Value() > curr. Max) curr. Max = msgs->Value(); } if (curr. Max > Get. Value()) *Mutable. Value() = curr. Max; else Vote. To. Halt(); } };

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow 16 Fault. Tolerance

Input, Graph Flow and Output § The input graph in Pregel is stored in a distributed storage layer (e. g. , GFS or Bigtable) § The input graph is divided into partitions consisting of vertices and outgoing edges § Default partitioning function is hash(ID) mod N, where N is the # of partitions § Partitions are stored at node memories for the duration of computations (hence, an in-memory model & not a disk-based one) § Outputs in Pregel are typically graphs isomorphic (or mutated) to input graphs § Yet, outputs can be also aggregated statistics mined from input graphs (depends on the graph algorithms)

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow 18 Fault. Tolerance

The Architectural Model § Pregel assumes a tree-style master-slave architecture network topology and Core Switch Rack Switch Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Master Push work (i. e. , partitions) to all workers 19 Send Completion Signals When the master receives the completion signal from every worker in super-step S, it starts super-step S + 1 a

The Execution Flow § Steps of Program Execution in Pregel: 1. Copies of the program code are distributed across all machines 1. 1 One copy is designated as the master and every other copy is deemed as a worker/slave 2. The master partitions the graph and assigns workers partition(s), along with portions of input “graph data” 3. Every worker executes the user-defined function on each vertex 4. Workers can communicate among each others 20

The Execution Flow § Steps of Program Execution in Pregel: 5. The master coordinates the execution of super-steps 6. The master calculates the number of inactive vertices after each super-step and signals workers to terminate if all vertices are inactive (and no messages are in transit) 7. Each worker may be instructed to save its portion of the graph 21

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow 22 Fault. Tolerance

Fault Tolerance in Pregel § Fault-tolerance is achieved through checkpointing § At the start of every super-step the master may instruct the workers to save the states of their partitions in a stable storage § Master uses “ping” messages to detect worker failures § If a worker fails, the master re-assigns corresponding vertices and input graph data to another available worker, and restarts the super-step § The available worker re-loads the partition state of the failed worker from the most recent available checkpoint 23

How Does Pregel Compare to Map. Reduce? 24

Pregel versus Map. Reduce Aspect Hadoop Map. Reduce Pregel Programming Model Shared-Memory (abstraction) Message-Passing Computation Model Synchronous Parallelism Model Data-Parallel Graph-Parallel Architectural Model Master-Slave Task/Vertex Scheduling Model Pull-Based Push-Based Application Suitability Loosely. Connected/Embarrassingly Parallel Applications Strongly-Connected Applications 25

Objectives Discussion on Programming Models Why parallelizing our programs? Parallel computer architectures Traditional Models of parallel programming Types of Parallel Programs Message Passing Interface (MPI) Map. Reduce, Pregel and Graph. Lab

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 27 Fault. Tolerance

Motivation for Graph. Lab § There is an exponential growth in the scale of Machine Learning and Data Mining (MLDM) algorithms § Designing, implementing and testing MLDM at large-scale are challenging due to: § § § Synchronization Deadlocks Scheduling Distributed state management Fault-tolerance § The interest on analytics engines that can execute MLDM algorithms automatically and efficiently is increasing § § Map. Reduce is inefficient with iterative jobs (common in MLDM algorithms) Pregel cannot run asynchronous problems (common in MLDM algorithms) 28

What is Graph. Lab? § Graph. Lab is a large-scale graph-parallel distributed analytics engine § Some Characteristics: • In-Memory (opposite to Map. Reduce and similar to Pregel) • High scalability • Automatic fault-tolerance • Flexibility in expressing arbitrary graph algorithms (more flexible than Pregel) • Shared-Memory abstraction (opposite to Pregel but similar to Map. Reduce) • Peer-to-peer architecture (dissimilar to Pregel and Map. Reduce) • Asynchronous (dissimilar to Pregel and Map. Reduce) 29

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 30 Fault. Tolerance

Input, Graph Flow and Output Graph. Lab assumes problems modeled as graphs It adopts two phases, the initialization and the execution phases Graph. Lab Execution Phase Initialization Phase Distributed File system Raw Graph Data (Map. Reduce) Graph Builder Parsing + Partitioning Atom Collection Index Construction Distributed File system Atom Index Atom File Atom File Cluster TCP RPC Comms Monitoring + Atom Placement GL Engine 31 Distributed File system Atom Index Atom File Atom File

Components of the Graph. Lab Engine: The Data-Graph § The Graph. Lab engine incorporates three main parts: 1. The data-graph, which represents the user program state at a cluster machine Vertex Edge Data-Graph 32

Components of the Graph. Lab Engine: The Update Function § The Graph. Lab engine incorporates three main parts: 2. The update function, which involves two main sub-functions: 2. 1 - Altering data within a scope of a vertex 2. 2 - Scheduling future update functions at neighboring vertices Sv v The scope of a vertex v (i. e. , Sv) is the data stored in v and in all v’s adjacent edges and vertices

Components of the Graph. Lab Engine: The Update Function § The Graph. Lab engine incorporates three main parts: 2. The update function, which involves two main sub-functions: 2. 1 - Altering data within a scope of a vertex 2. 2 - Scheduling future update functions at neighboring vertices Algorithm: The Graph. Lab Execution Engine Schedule v The update function

Components of the Graph. Lab Engine: The Update Function § The Graph. Lab engine incorporates three main parts: 2. The update function, which involves two main sub-functions: Scheduler 2. 1 - Altering data within a scope of a vertex 2. 2 - Scheduling future update functions at neighboring vertices CPU 1 e b a hi h c b a f i d g j CPU 2 The process repeats until the scheduler is empty k

Components of the Graph. Lab Engine: The Sync Operation § The Graph. Lab engine incorporates three main parts: 3. The sync operation, which maintains global statistics describing data stored in the data-graph § Global values maintained by the sync operation can be written by all update functions across the cluster machines § The sync operation is similar to Pregel’s aggregators § A mutual exclusion mechanism is applied by the sync operation to avoid write-write conflicts § For scalability reasons, the sync operation is not enabled by default

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 37 Fault. Tolerance

The Architectural Model § Graph. Lab adopts a peer-to-peer architecture § All engine instances are symmetric § Engine instances communicate together using Remote Procedure Call (RPC) protocol over TCP/IP § The first triggered engine has an additional responsibility of being a monitoring/master engine § Advantages: § Highly scalable § Precludes centralized bottlenecks and single point of failures § Main disadvantage: § Complexity

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 39 Fault. Tolerance

The Programming Model § Graph. Lab offers a shared-memory programming model § It allows scopes to overlap and vertices to read/write from/to their scopes

Consistency Models in Graph. Lab § Graph. Lab guarantees sequential consistency § Provides the same result as a sequential execution of the computational steps § User-defined consistency models § § § Full Consistency Vertex Consistency Edge Consistency 41 Vertex v

Full Consistency Models in Graph. Lab Read Write 2 1 D 1↔ 2 D 2 Edge Consistency Model Read 1 D 1 3 D 2↔ 3 Vertex Consistency Model Read 1 D 2 D 4↔ 5 D 3 4 D 3↔ 4 Write 3 2 D 1↔ 2 D 3↔ 4 5 D 5 Write 2 D 1↔ 2 D 3 4 D 2↔ 3 D 4 5 D 4↔ 5 4 D 3↔ 4 D 5 5 D 4↔ 5 D 5

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 43 Fault. Tolerance

The Computation Model § Graph. Lab employs an asynchronous computation model § It suggests two asynchronous engines § Chromatic Engine § Locking Engine § The chromatic engine executes vertices partially asynchronous § It applies vertex coloring (e. g. , no adjacent vertices share the same color) § All vertices with the same color are executed before proceeding to a different color § The locking engine executes vertices fully asynchronously § Data on vertices and edges are susceptible to corruption § It applies a permission-based distributed mutual exclusion mechanism to avoid read-write and write-write hazards

The Graph. Lab Analytics Engine Graph. Lab Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model 45 Fault. Tolerance

Fault-Tolerance in Graph. Lab § Graph. Lab uses distributed checkpointing to recover from machine failures § It suggests two checkpointing mechanisms § Synchronous checkpointing (it suspends the entire execution of Graph. Lab) § Asynchronous checkpointing

How Does Graph. Lab Compare to Map. Reduce and Pregel? 47

Graph. Lab vs. Pregel vs. Map. Reduce Aspect Hadoop Map. Reduce Pregel Graph. Lab Programming Model Shared-Memory Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-Parallel Graph-Parallel Architectural Model Master-Slave Peer-to-Peer Task/Vertex Scheduling Model Pull-Based Push-Based Application Suitability Loosely. Connected/Embarra ssingly Parallel Applications Strongly-Connected Applications (more precisely MLDM apps)

Next Week… § Fault-tolerance 49

Back-up Slides 50

Page. Rank § Page. Rank is a link analysis algorithm § The rank value indicates an importance of a particular web page § A hyperlink to a page counts as a vote of support § A page that is linked to by many pages with high Page. Rank receives a high rank itself § A Page. Rank of 0. 5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0. 5 Page. Rank

Page. Rank (Cont’d) § Iterate: § Where: § α is the random reset probability § L[j] is the number of links on page j 1 2 3 4 5 6

Page. Rank Example in Graph. Lab § Page. Rank algorithm is defined as a per-vertex operation working on the scope of the vertex pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } Dynamic computation