Google Map Reduce Simplified Data Processing on Large

  • Slides: 19
Download presentation
Google Map. Reduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google,

Google Map. Reduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development http: //labs. google. com/papers/mapreduce. html

Outline Motivation Map. Reduce Concept Map? Reduce? Example of Map. Reduce problem Reverse Web-Link

Outline Motivation Map. Reduce Concept Map? Reduce? Example of Map. Reduce problem Reverse Web-Link Graph Map. Reduce Cluster Environment Lifecycle of Map. Reduce operation Optimizations to Map. Reduce process Conclusion Map. Reduce in Googlicious Action

Motivation: Large Scale Data Processing Many tasks composed of processing lots of data to

Motivation: Large Scale Data Processing Many tasks composed of processing lots of data to produce lots of other data Want to use hundreds or thousands of CPUs. . . but this needs to be easy! Map. Reduce provides User-defined functions Automatic parallelization and distribution Fault-tolerance I/O scheduling Status and monitoring

Programming Concept Map Perform a function on individual values in a data set to

Programming Concept Map Perform a function on individual values in a data set to create a new list of values Example: square x = x * x map square [1, 2, 3, 4, 5] returns [1, 4, 9, 16, 25] Reduce Combine values in a data set to create a new value Example: sum = (each elem in arr, total +=) reduce [1, 2, 3, 4, 5] returns 15 (the sum of the elements)

Example: Reverse Web-Link Graph Find all pages that link to a certain page Map

Example: Reverse Web-Link Graph Find all pages that link to a certain page Map Function Outputs <target, source> pairs for each link to a target URL found in a source page For each page we know what pages it links to Reduce Function Concatenates the list of all source URLs associated with a given target URL and emits the pair: <target, list(source)> For a given web page, we know what pages link to it

Additional Examples Distributed grep Distributed sort Term-Vector per Host Web Access Log Statistics Document

Additional Examples Distributed grep Distributed sort Term-Vector per Host Web Access Log Statistics Document Clustering Machine Learning Statistical Machine Translation

Performance Boasts Distributed grep 1010 100 -byte files (~1 TB of data) 3 -character

Performance Boasts Distributed grep 1010 100 -byte files (~1 TB of data) 3 -character substring found in ~100 k files ~1800 workers 150 seconds start to finish, including ~60 seconds startup overhead Distributed sort Same files/workers as above 50 lines of Map. Reduce code 891 seconds, including overhead Best reported result of 1057 seconds for Tera. Sort benchmark

Typical Cluster 100 s/1000 s of Dual-Core, 2 -4 GB Memory Limited internal bandwidth

Typical Cluster 100 s/1000 s of Dual-Core, 2 -4 GB Memory Limited internal bandwidth Temporary storage on local IDE disks Google File System (GFS) Distributed file system for permanent/shared storage Job scheduling system Jobs made up of tasks Master-Scheduler assigns tasks to Worker machines

Execution Initialization Split input file into 64 MB sections (GFS) Read in parallel by

Execution Initialization Split input file into 64 MB sections (GFS) Read in parallel by multiple machines Fork off program onto multiple machines One machine is Master assigns idle machines to either Map or Reduce tasks Master Coordinates data communication between map and reduce machines

Map-Machine Reads contents of assigned portion of input-file Parses and prepares data for input

Map-Machine Reads contents of assigned portion of input-file Parses and prepares data for input to map function (e. g. read <a /> from HTML) Passes data into map function and saves result in memory (e. g. <target, source>) Periodically writes completed work to local disk Notifies Master of this partially completed work (intermediate data)

Reduce-Machine Receives notification from Master of partially completed work Retrieves intermediate data from Map-Machine

Reduce-Machine Receives notification from Master of partially completed work Retrieves intermediate data from Map-Machine via remote-read Sorts intermediate data by key (e. g. by target page) Iterates over intermediate data For each unique key, sends corresponding set through reduce function Appends result of reduce function to final output file (GFS)

Worker Failure Master pings workers periodically Any machine who does not respond is considered

Worker Failure Master pings workers periodically Any machine who does not respond is considered “dead” Both Map- and Reduce-Machines Any task in progress gets needs to be re-executed and becomes eligible for scheduling Map-Machines Completed tasks are also reset because results are stored on local disk Reduce-Machines notified to get data from new machine assigned to assume task

Skipping Bad Records Bugs in user code (from unexpected data) cause deterministic crashes Optimally,

Skipping Bad Records Bugs in user code (from unexpected data) cause deterministic crashes Optimally, fix and re-run Not possible with third-party code When worker dies, sends “last gasp” UDP packet to Master describing record If more than one worker dies over a specific record, Master issues yet another re-execute command Tells new worker to skip problem record

Backup Tasks Some “Stragglers” not performing optimally Other processes demanding resources Bad Disks (correctable

Backup Tasks Some “Stragglers” not performing optimally Other processes demanding resources Bad Disks (correctable errors) Slow down I/O speeds from 30 MB/s to 1 MB/s CPU cache disabled ? ! Near end of phase, schedule redundant execution of in-process tasks First to complete “wins”

Locality Network Bandwidth scarce Google File System (GFS) Around 64 MB file sizes Redundant

Locality Network Bandwidth scarce Google File System (GFS) Around 64 MB file sizes Redundant storage (usually 3+ machines) Assign Map-Machines to work on portions of input-files which they already have on local disk Read input file at local disk speeds Without this, read speed limited by network switch

Conclusion Complete rewrite of the production indexing system 20+ TB of data indexing takes

Conclusion Complete rewrite of the production indexing system 20+ TB of data indexing takes 5 -10 Map. Reduce operations indexing code is simpler, smaller, easier to understand Fault Tolerance, Distribution, Parallelization hidden within Map. Reduce library Avoids extra passes over the data Easy to change indexing system Improve performance of indexing process by adding new machines to cluster