Matrix Layout Problem in Map Reduce 12182021 Presenter
Matrix Layout Problem in Map. Reduce 12/18/2021 Presenter: Yunfei Shen Luyao Li [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 1
Content �Introduction �Plans (1, 2 & 3) �Data Analysis �Demo 12/18/2021 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 2
Introduction �Map. Reduce is efficient to parse small files, but massive scientific raw data… �Map. Reduce is not optimized for matrix-form data and linear algebra kernels 12/18/2021 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 3
Plans �Consider a very simple matrix multiplication problem, A x B=C. Possible plans are [1]: 1. Simple dot product Ci, j= Ai, k * Bk, j 2. Divide A into s t submatrices (blocks) and B into t u ones. Cblock=Ablock*Bblock 12/18/2021 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 4
Our Plan � Sub-matrix Sequence. File. Write ->key: location (x, y) Mapper: Input Key 1 (index 1, index 2) , Int. Writable (v) A{(s, t), As, t} B{(t, u), Bt, u} Need specially consider last. Block in row or column, maybe unbalanced: last. Block. Index, last. Block. Size Output Key 2(index 1, index 2, index 3) , Value (index 1, index 2, v) {(S, T, U), <(s, t, As, t), (t, u, Bt, u)>} Reducer: Input Key 2(index 1, index 2, index 3) , Value (index 1, index 2, v) 1 Cs, t, u=As, t*Bt, u Multiplication 2 Cs, u=∑ Cs, t, u Sum Output Key 1(index 1, index 2) , Intwritable (v) C{(s, t), Cs, t} 12/18/2021 5
Our Plan � Sub-matrix - map A= 4*4, sub=2*2 0 B=4*4, sub=2*2 12/18/2021 Key 1(x, y) A B ( 0, 0) ( 0, 2) ( 0, 1) ( 0, 3 ) ( 1, 0) ( 1, 2 ) ( 1, 1) ( 1, 3 ) Int. Writable(v) A B 1 5 2 6 3 7 4 8 Key 2(S, T, U, m) ( 0, 0, 1, 0) ( 0, 0, 1, 1) Value(x, y, v) ( 0, 0, 1), (0, 1, 2), (1, 0, 3 ), (1, 1, 4 ) ( 0, 0, 5), (0, 1, 6), ( 1, 0, 7 ), (1, 1, 8 ) 6
Our Plan � Sub-matrix - reduce Key 2(S, T, U, m) ( 0, 0, 1, 0) ( 0, 0, 1, 1) Value(x, y, v) ( 0, 0, 1), (0, 1, 2), (1, 0, 3 ), (1, 1, 4 ) ( 0, 0, 5), (0, 1, 6), ( 1, 0, 7 ), (1, 1, 8 ) Key 1 (x, y) (0, 3) (0, 4) (1, 3) (1, 4) Value (x, y, v) (0, 0, 19) (0, 1, 21) (1, 1, 43) (1, 2, 50) 12/18/2021 C=4*4 7
Plan 1 � The simplest strategy is to have each reducer do just one of the block multiplications. Key 2(S, T, U, m) (0, 0, 1, 0) (0, 0, 1, 1) 0, 0, 1 Value(x, y, v) (0, 0, 1) (0, 0, 2) Job 1: Reducer Job 1 Do multiplication in Job 1 reducer: Key 1(x, y) Int. Writable(v) (0, 2) 2 Job 2 Do sum in Job 2 reducer: Key 1(x, y) Int. Writable(v) (0, 2) {…, 2, …} Too many Reducers: num. S*num. T*num. U, heavy network traffic!!! 12/18/2021 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 8
Plan 2 � In this plan, we use a single reducer to multiply a single A block times a whole row of B blocks. Also two jobs. Job 1 A block (S, T) e. g. ( 0, 0) 0, 0 12/18/2021 Greatly decreasing the number of Reducers: num. S*num. T maximum reducers are needed for this 0 , 1 plan! 1, 0 1, 1 Do multiplication Key 2(S, T, U) Value(x, y, v) A (0, 0, -1) (0, 0, 1) B (0, 0, 1) (0, 0, 2) B block (T, K 1, 2…) e. g. ( 0, 0) (0, 1) Job 2: do sum. [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 9
Plan 3 � In Plan 3, we use a single reducer to compute the final C block, and there's no need for a second Map. Reduce job. A blocks and B blocks are arranged like following order after Mapper: for 0 <= T < num. T, A[S, 0] B[0, U] A[S, 1] B[1, U]. . . A[S, num. T-1] B[num. T, U] Reducer number is: num. S*num. U Only one job needed! A 0 row 0, 0 0 , 1 1, 0 1, 1 A(0, 0) B(0, 1) A(0, 1) B(1, 1) B 0 column 12/18/2021 Key 2(S, U, T) Value(x, y, v) A (0, 1, 0) (0, 0, 1) B (0, 1, 0) (0, 0, 2) Multiply and sum in one job if (ms != s || mu != u) write reducer [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 10
Analysis s t u 50 68 72 Plan Job Reducer Time (s) 300 1 4500 2 4000 3 2 280 60 200 2 35 58 1 40 32 3500 s t u Plan 3000 103 112 119 1 2500 2 2000 3 1500 s t u Plan 1000 504 523 547 1 500 2 0 3 12/18/2021 Job Reducer Time 2 196 120 2 36 109 1 36 65 Job Reducer Time 2 125 2 25 1 10 K 25 Job 100 Reducer 0 200 1 2 3 150 1 100 2 50 0 3 1 2 3 150 4091 100 3692 25 k 3216 50 500 k 0 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce Time Job Reducer 1 2 3 Time/100 11
Demo �A= , B= �C=A x B= �Use 2 x 2 sub-matrix, respectively 12/18/2021 run with strategy 1, 2, 3 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 12
Thank you 12/18/2021 [1] Yi Zhang, Parallel Matrix Layout for Map. Reduce 13
- Slides: 13