Map Reduce Google and Map Reduce n n

  • Slides: 6
Download presentation
Map. Reduce

Map. Reduce

Google and Map. Reduce n n n Google searches billions of web pages very,

Google and Map. Reduce n n n Google searches billions of web pages very, very quickly How? It uses a technique called “Map. Reduce” to distribute the work across a large number of computers, then combine the results This has made Map. Reduce a very popular approach Hadoop is an open source implementation of Map. Reduce n Unless you work for Google, you will probably use Hadoop 2

How it works n n List(a, b, c, …). map(x => f(x)) gives List(f(a),

How it works n n List(a, b, c, …). map(x => f(x)) gives List(f(a), f(b), f(c), …) List(a, b, c, …). reduce((x, y) => x y) gives a b c … where is some binary operator 3

Another view n http: //www. cnblogs. com/sharpxiajun/p/3151395. html (in Japanese) 4

Another view n http: //www. cnblogs. com/sharpxiajun/p/3151395. html (in Japanese) 4

Fork. Join n n How does Fork. Join differ from Map. Reduce? Answers from

Fork. Join n n How does Fork. Join differ from Map. Reduce? Answers from stackoverflow: n n Fork. Join recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores Map. Reduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5 MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. Map. Reduce takes tens of seconds to start up, but scales well for much larger inputs (>100 MB) on a compute cluster. 5

The End 6

The End 6