Cluster Computing with Dryad LINQ Mihai Budiu MSRSVC

Cluster Computing with Dryad. LINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Aknowledgments MSR SVC and ISRC SVC Michael Isard, Yuan Yu, Andrew Birrell, Dennis Fetterly Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey 2

Computer Evolution ? 1961 2008 2040 3

Computer Evolution ? ENIAC 1943 30 tons 200 k. W Datacenter 2008 500, 000 ft 2 40 MW 2040 4

2040 5

Layers Applications Operating System Programming Languages and APIs Resource Management Scheduling Distributed Execution Caching and Synchronization Storage Identity & Security Networking 6

Pieces of the Global Computer 7

This Work 8

The Rest of This Talk Machine Learning Large Vectors Dryad. LINQ Dryad Distributed Filesystem CIFS/NTFS Cluster Services Windows Server 9

Tera. Sort �How fast can you sort 1010 100 -byte records (1 Tb)? �Sequential scan/disk = 4. 6 hours �Current record: 435 seconds (7. 2 min) cluster of 40 Itanium 2, 2520 SAN disks �Code: 3300 lines of C �Our result: 349 seconds (5. 8 min) cluster of 240 AMD 64 (quad) machines, 920 disks �Code: 17 lines of LINQ 10

Outline • • Introduction Dryad. LINQ Building on Dryad. LINQ 11

Outline • Introduction • Dryad – deployed since 2006 – many thousands of machines – analyzes many petabytes of data/day • Dryad. LINQ • Building on Dryad. LINQ 12

Goal 13

Design Space Grid Internet Dataparallel Dryad Search Private data center Shared memory Transaction Latency HPC Throughput 14

Data Partitioning DATA RAM DATA 15

2 -D Piping • Unix Pipes: 1 -D grep | sed | sort | awk | perl • Dryad: 2 -D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 16

Dryad = Execution Layer Job (application) Dryad Cluster ≈ Pipeline Shell Machine 17

Virtualized 2 -D Pipelines 18

Virtualized 2 -D Pipelines 19

Virtualized 2 -D Pipelines 20

Virtualized 2 -D Pipelines 21

Virtualized 2 -D Pipelines • 2 D DAG • multi-machine • virtualized 22

Dryad Job Structure Channels Input files Stage sort grep sed awk perl sort awk sed Vertices (processes) Output files sort 23

Channels Finite Streams of items X Items M • distributed filesystem files (persistent) • SMB/NTFS files (temporary) • TCP pipes (inter-machine) • memory FIFOs (intra-machine) 24

Architecture data plane job schedule Files, TCP, FIFO, Network NS Job manager control plane V V V PD PD PD cluster 25

Fault Tolerance

Dynamic Graph Rewriting X[0] X[1] X[3] Completed vertices X[2] Slow vertex X’[2] Duplicate vertex Duplication Policy = f(running times, data volumes)

Dynamic Aggregation S S S rack # dynamic S S #3 S #2 S T static #1 S S #2 S #1 S # 1 A # 2 A T # 3 A 28

Data-Parallel Computation Application Execution Storage Parallel Databases Map. Reduce Dryad GFS Big. Table 29

Outline • • Introduction Dryad. LINQ Building on Dryad 30

Dryad. LINQ Dryad 31

LINQ Collection<T> collection; bool Is. Legal(Key); string Hash(Key); var results = from c in collection where Is. Legal(c. key) select new { Hash(c. key), c. value}; 32

Dryad. LINQ = LINQ + Dryad Vertex code Collection<T> collection; bool Is. Legal(Key k); string Hash(Key); var results = from c in collection where Is. Legal(c. key) select new { Hash(c. key), c. value}; Query plan (Dryad job) Data collection C# C# results 33

Data Model C# objects Partition Collection 34

Query Providers Client machine Data center Dryad. LINQ C# To. Dryad. Table Query Expr Distributed query plan Invoke Query JM foreach C# Objects Output (11) Dryad. Table Results Input Tables Dryad Execution Output Tables 35

Demo 36

Example: Histogram public static IQueryable<Pair> Histogram( IQueryable<Line. Record> input, int k) { var words = input. Select. Many(x => x. line. Split(' ')); var groups = words. Group. By(x => x); var counts = groups. Select(x => new Pair(x. Key, x. Count())); var ordered = counts. Order. By. Descending(x => x. count); var top = ordered. Take(k); return top; } “A line of words of wisdom” [“A”, “line”, “of”, “words”, “of”, “wisdom”] [[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]] [ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}] 37

Histogram Plan Select. Many Hash. Distribute Merge Group. By Select Order. By. Descending Take Merge. Sort Take 38

Map-Reduce in Dryad. LINQ public static IQueryable<S> Map. Reduce<T, M, K, S>( this IQueryable<T> input, Expression<Func<T, IEnumerable<M>>> mapper, Expression<Func<M, K>> key. Selector, Expression<Func<IGrouping<K, M>, S>> reducer) { var map = input. Select. Many(mapper); var group = map. Group. By(key. Selector); var result = group. Select(reducer); return result; } 39

M M M map Q Q Q Q sort M G 1 G 1 groupby D R R R R reduce D D D D distribute MS MS mergesort G R (1) X S S S A (2) MS MS G 2 G 2 G 2 groupby R R R reduce X X X S A T (3) MS S A S MS MS mergesort G 2 groupby R R reduce X X consumer partial aggregation M reduce M map Map-Reduce Plan 40

Distributed Sorting in Dryad. LINQ public static IQueryable<TSource> DSort<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> key. Selector, int pcount) { var samples = source. Apply(x => Sampling(x)); var keys = samples. Apply(x => Compute. Keys(x, pcount)); var parts = source. Range. Partition(key. Selector, keys); return parts. Order. By(key. Selector); } 41

Distributed Sorting Plan DS DS H O DS H D (1) DS D H D (2) DS D D (3) M M M S S S 42

Outline • • Introduction Dryad. LINQ Building on Dryad. LINQ 43

Machine Learning in Dryad. LINQ Data analysis Machine learning Large Vector Dryad. LINQ Dryad 44

Operations on Large Vectors: Map 1 T f U f preserves partitioning T f U 45

Map 2 (Pairwise) T U f V 46

Map 3 (Vector-Scalar) T f U V T U f V 47

Reduce (Fold) f U U f f f U U U f U 48

Linear Algebra T T , U , V = 49

Linear Regression • Data • Find • S. t. 50

Analytic Solution X[0] X[1] X[2] Y[0] Y[1] Y[2] Map X×XT Y×XT Reduce Σ Σ [ ]-1 * A 51

Linear Regression Code Vectors x = input(0), y = input(1); Matrices xx = x. Map(x, (a, b) => a. Outer. Prod(b)); One. Matrix xxs = xx. Sum(); Matrices yx = y. Map(x, (a, b) => a. Outer. Prod(b)); One. Matrix yxs = yx. Sum(); One. Matrix xxinv = xxs. Map(a => a. Inverse()); One. Matrix A = yxs. Map(xxinv, (a, b) => a. Mult(b)); 52

Expectation Maximization (Gaussians) • 160 lines • 3 iterations shown 53

Conclusions • Dryad = distributed execution environment • Application-independent (semantics oblivious) • Supports rich software ecosystem – Relational algebra, Map-reduce, LINQ • Dryad. LINQ = Compiles LINQ to Dryad • C# objects and declarative programming • . Net and Visual Studio for parallel programming 54

Backup Slides 55

sed, awk, grep, etc. legacy code PSQL Machine Learning C# Perl C++ Scope Distributed Shell C# SSIS Vectors Dryad. LINQ C++ SQL server Dryad Distributed Filesystem CIFS/NTFS Job queueing, monitoring Software Stack Cluster Services Windows Server 56

Very Large Vector Library Partitioned. Vector<T> T T T Scalar<T> T 57

Dryad. LINQ • Declarative programming • Integration with Visual Studio • Integration with. Net • Type safety • Automatic serialization • Job graph optimizations § static § dynamic • Conciseness 58

Sort & Map-Reduce in Dryad. LINQ 59

Dryad Map-Reduce • Many similarities • • Execution layer Job = arbitrary DAG Plug-in policies Program=graph gen. Complex ( features) New (< 2 years) Still growing Internal • • Exe + app. model Map+sort+reduce Few policies Program=map+reduce Simple Mature (> 4 years) Widely deployed Hadoop 60

PLINQ public static IEnumerable<TSource> Dryad. Sort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> key. Selector, IComparer<TKey> comparer, bool is. Descending) { return source. As. Parallel(). Order. By(key. Selector, comparer); } 61

Query histogram computation • • Input: log file (n partitions) Extract queries from log partitions Re-partition by hash of query (k buckets) Compute histogram within each bucket

Naïve histogram topology P D S C parse lines hash distribute quicksort count occurrences MS merge sort Each Q is: k C R Q k n n R Q S C k Each R S is: D C P MS

Efficient histogram topology P D S C parse lines hash distribute quicksort count occurrences MS merge sort M non-deterministic merge Each k R k Q' is: Each T R C Each is: R T Q' n S is: D P C C M MS MS

Final histogram refinement 1, 800 computers 43, 171 vertices 11, 072 processes 11. 5 minutes 450 R 450 33. 4 GB R 118 GB T 217 T 154 GB Q' 10, 405 99, 713 Q' 10. 2 TB

Data Distribution (Group By) Source m mxn Dest n 66

Range-Distribution Manager S S S [0 -100) Hist [0 -30), [30 -100) static T D D T [0 -? ) [0 -30) dynamic D T [? -100) [30 -100) 67

Goal: Declarative Programming X X static X S S T X T S T T dynamic 68

Staging 1. Build 2. Send. exe JM code 7. Serialize vertices vertex code 5. Generate graph 6. Initialize vertices 3. Start JM Cluster services 8. Monitor Vertex execution 4. Query cluster resources

Sky. Server Query 18 select distinct P. Obj. ID into results from photo. Primary U, neighbors N, photo. Primary L where U. Obj. ID = N. Obj. ID and L. Obj. ID = N. Neighbor. Obj. ID and P. Obj. ID < L. Obj. ID and abs((U. u-U. g)-(L. u-L. g))<0. 05 and abs((U. g-U. r)-(L. g-L. r))<0. 05 and abs((U. r-U. i)-(L. r-L. i))<0. 05 and abs((U. i-U. z)-(L. i-L. z))<0. 05 H n Y Y L L U S 4 n S M 4 n M D n D X n X N U N 70

Sky. Server Q 18 Performance 16. 0 Dryad In-Memory 14. 0 Dryad Two-pass 12. 0 Speed-up (times) SQLServer 2005 10. 0 8. 0 6. 0 4. 0 2. 0 0 2 4 6 Number of Computers 8 10 71