Map Reduce 3 Map Reduce chenyishuaigmail com MapReduce
- Slides: 35
大数据存储与应用 大规模文件系统及 Map Reduce 3. Map. Reduce 陈一帅 chenyishuai@gmail. com
Map-Reduce 原理
Python Map/Reduce • Map: applies a function to all the items in an input_list • a = [1, 2, 3] b = [4, 5, 6] map(lambda x, y: x+y, a, b( • [5, 7, 9]
Python Map/Reduce • Reduce: applies a rolling computation to sequential pairs of values in a list • reduce(lambda x, y: x*y, [1, 2, 3([ • 6 • (3*(2*1))
Map-Reduce
Map-Reduce
Group by key • Task of Map-Reduce environment • Partition • Hash(word) mod R • R: Reducer个数 • Hash(first letter(word)) mod R
作调度 • Task • 状态:Idle,In-progress,Completed • 分配idle task给Worker • Map Worker 完成一个Task,报告 master 作完 成,及中间结果存储的位置(partition好了,每 个Reducer 一个中间结果文件) • Master通知Reducer去拿 • Reducer Worker 完成一个Task,报告 master 作结果
Pipeline
存储 + 计算 = Map-Reduce计算模 型 namenode job submission node namenode daemon jobtracker tasktracker datanode daemon Linux file system … slave node
Map-Reduce 算法
Map-Reduce: 矩阵乘法 • • Page. Rank n × n 矩阵 M n × 1 向量 V M ×V • 通过key,把计算元素( mijvj )Partition到一个 Reducer去 • Key: i
Map-Reduce: 矩阵乘法 • Key: (i, k)
Projection: R 1 R 2 R 3 R 4 R 5 No reducer
Selection R 1 R 2 R 1 R 3 R 4 R 5 No reducer
Relational Joins R 1 S 1 R 2 S 2 R 3 S 3 R 4 S 4 R 1 S 2 R 2 S 4 R 3 S 1 R 4 S 3
Join • Key:B
优化开销:Combiners k 1 v 1 k 2 v 2 map a 1 k 3 v 3 k 4 v 4 map b 2 c 3 c k 5 v 5 k 6 v 6 map 6 a 5 c map 2 b 7 c Shuffle and Sort: aggregate values by keys a 1 5 b 2 7 c 2 3 6 8 减少通信开销 reduce r 1 s 1 r 2 s 2 r 3 s 3 8
优化: Combiners k 1 v 1 k 2 v 2 map a 1 c 3 c 6 减少通信开销 k 4 v 4 map b 2 c combine a 1 k 3 v 3 3 c c partition k 6 v 6 map 6 a 5 combine b 2 k 5 v 5 c map 2 b 7 combine 9 a 5 partition c c combine 2 b 7 partition c partition Shuffle and Sort: aggregate values by keys a 1 5 b 2 7 8 c 2 3 9 6 8 8 reduce r 1 s 1 r 2 s 2 r 3 s 3 8
- Dậy thổi cơm mua thịt cá
- Cơm
- Hadoop is open source
- Map reduce join
- Matrix vector multiplication by mapreduce
- Matlab mapreduce
- Mapreduce types and formats
- Mapreduce: simplified data processing on large clusters
- Relational algebra operations in mapreduce
- Mapreduce inverted index
- Introduction to mapreduce in cloud computing
- Matrix vector multiplication by mapreduce
- Brief history of hadoop
- Distributed grep
- Hadoop combiner example
- Mapreduce simplified data processing on large clusters
- Shortest path algorithm in mapreduce
- Mapreduce
- Map reduce algorithm
- Flynn’s taxonomy
- Matrix vector multiplication by mapreduce
- Mapreduce simplified data processing on large clusters
- Map reduce word count
- Google map reduce
- Java map reduce
- Document
- Map reduce paper
- Map-reduce
- Map reduce program
- Map-reduce
- Sherpamap
- Java map reduce
- Lisp map reduce
- Google map reduce
- Ways to reduce poverty
- Reduce stress