Introduction to Erasure coding Kenji Kaneda Outline Problem
Introduction to Erasure coding Kenji Kaneda
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
Problem Specification (1/2) • Given – n Data devices (D 1, D 2, …, Dn) • Each holds k bytes – m Checksum devices (C 1, C 2, …, Cm) • Each holds k bytes D 1 D 2 m=2 C 1 C 2 n=8 D 3 D 4 D 5 D 6 D 7 D 8
Problem Specification (2/2) • Goal – Define the calculation of each Ci such that if any m of D 1, D 2, …, Dn, C 1, C 2, …, Cm fail, then the failed devices can be reconstructed from the non-failed devices D 1 D 2 m=2 C 1 C 2 n=8 D 3 D 4 D 5 D 6 D 7 D 8
An Example Configuration • “n+1 -parity” coding (RAID Level 5) – m=1 – c 1, j = d 1, j ⊕ d 2, j ⊕ … ⊕ dn, j where c 1, j = j-th byte of C 1 and di, j = j-th byte of Di D 1 C 1 D 2 … Dn
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
General Strategy (1/4) • Partition storage devices … D 1 D 2 Dn … C 1 C 2 Cm
General Strategy (2/4) • Initialize checksum devices … D 1 D 2 Dn … C 1 C 2 Cm
General Strategy (3/4) • Update data and checksum devices update … D 1 D 2 update Dn update … C 1 C 2 Cm
General Strategy (4/4) • Recover storage devices from failures … D 1 D 2 Dn … C 1 C 2 Cm
Partitioning of Devices (1/2) • Break up each device into words – Size of each word is w bits • w is chosen by a programmer w bits k bytes Di
Partitioning of Devices (2/2) • Henceforth we assume that each device holds just 1 word (for simplicity) – data words: d 1, d 2, …, dn – checksum words: c 1, c 2, …, cm d 1 d 2 D 1 D 2 c 1 c 2 C 1 C 2 … dn Dn … cm Cm
Calculation of Checksum • Define a coding function Fi (d 1, d 2, …, dn) – Calculates a checksum word on Ci E. g. ) F 1 (d 1, d 2, …, dn) = d 1 ⊕ d 2 ⊕ … ⊕ dn d 1 d 2 D 1 D 2 c 1=F 1(d 1, …, dn) c 2=F 2(d 1, …, dn) C 1 C 2 … dn Dn … cm=Fm(d 1, …, dn) Cm
Update of Checksum • Define an update function Gi, j (dj, dj’, ci) – Calculates a checksum word on Ci when a checksum word on Ci is ci and a data word on Dj is updated from dj to dj’ E. g. ) G 1, j (dj, dj’, ci) = c 1 ⊕ dj’ d 1 D 1 d 2 d ’ 2 … D 2 c 1’=G 1, 2 c 1(d 2, d 2’, c 1) c 2’=G 2, 2 c 2(d 2, d 2’, c 2) … C 1 C 2 dn Dn cm(d , d ’, c ) c 3’=G 3, 2 2 2 3 Cm
Recovery from Failure 1. Restore the words in any failed data device Dj from the words in the nonfailed devices E. g. ) dj = d 1 ⊕ … ⊕ dj-1 ⊕ dj+1 ⊕ … ⊕ dn ⊕ c 1 2. Re-compute any failed checksum devices Ci with Fi
Problem Restatement • Given n data words d 1, d 2, …, dn, all of size w • Define functions F and G to calculate and maintain the checksum words c 1, c 2, …, cm
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
Overview of Reed-Solomon Coding • Using the Vandermonde matrix to calculate and maintain checksum words • Using Gaussian Elimination to recover from failures • Using Galois Fields to perform arithmetic
Calculating and Maintaining Checksum Words • Define a coding function Fi and an update function Gi, j
Definition of Coding Function (1/2) • Define Fi to be a linear combination of the data words n ci = Fi (d 1, d 2, …, dn) = Σ dj fi, j j=1 – Vector representation c 1 c 2 C= : cm f 1, 1 f 2, 1 = f 1, 2 f 2, 2 … … f 1, n f 2, n : : fm, 1 fm, 2 … : fm, n d 1 d 2 : : dn = FD
Definition of Coding Function (2/2) • Define F to be the m×n Vandermonde matrix fi, j = j i-1 f 1, 1 f 2, 1 F= f 1, 2 f 2, 2 … … : : fm, 1 fm, 2 … f 1, n f 2, n = : fm, n 1 1 : 1 1 … 2 … : 2 m-1 … 1 n : nm-1
Definition of Update Function Gi, j (dj, dj’, ci) = ci + fi, j (dj’ – dj) – Subtract out the portion of the checksum word that corresponds to dj – Add the required amount for dj’
Recovering from Failures (1/4) • Define matrix A and E A= I F E= D C I : n×n identity matrix AD = E
Recovering from Failures (2/4) • When devices fail, – Delete the corresponding rows from A and E AD = 1 0 : 0 1 1 : 1 0 1 : 0 1 2 : 2 m-1 … … … 0 0 : 1 1 n : nm-1 d 2 : dn = d 1 d 2 : dn c 1 c 2 : cm =E
Recovering from Failures (3/4) • When devices fail, – Delete the corresponding rows from A and E A’D = 0 : 0 1 1 : 0 1 … : 1 : 2 m-1 … … … 0 : 1 1 : nm-1 d 2 : dn = d 2 : dn c 1 : cm = E’
Recovering from Failures (4/4) • Values of D are recovered from A’D = E’ using Gaussian Elimination E. g. ) if m devices fail, D = (A’) -1 E’ • A’ is a non-singular because F is Vandermonde matrix
Problem with Arithmetic Operations (1/2) • Domain and range of the computation are binary words of a fixed length w – Not infinite precision real numbers
Problem with Arithmetic Operations (2/2) • The algebra is correct when all the elements are infinite precision real numbers We must make sure that it is correct for the fixed-size words
Naïve Solution and its Problem • Arithmetic over the integers modulo 2 w Division is not defined for all pairs of elements E. g. ) (3÷ 2) is undefined modulo 22 (=4)
Our Solution • Perform addition/multiplication over a Galois Field
Mapping Between Elements of GF(2 w)and Binary Words • r(x) ∈ GF(2 w) ⇔ a binary word b of size w such that i-th bit of b = the coefficient of xi in r(x) = awxw + aw-1 xw-1 + … + a 1 x + a 0 b = awaw-1 … a 1 a 0
Examples of Mapping (1/3) • GF(22) = GF(2)[x]/x 2+x+1 Generated Polynomial element 0 0 Binary element 00 Decimal element 0 x 1 1 01 1 x 2 x 10 2 x 3 x+1 11 3
Examples of Mapping (2/3) • GF(24) = GF(2)[x]/x 4+x+1 Generated element Polynomial element Binary element Decimal element 0 0 0000 0 x 0 1 0001 1 x 0010 2 x 2 0100 4 x 3 1000 8 x 4 x+1 0011 3 x 5 x 2+x 0110 6 x 3+x 2 1100 12
Example of Mapping (3/3) Generated element Polynomial element Binary element Decimal element x 7 x 3+x+1 1011 11 x 8 x 2+1 0101 5 x 9 x 3+x 1010 10 x 2+x+1 0111 7 x 11 x 3+x 2+x 1110 14 x 12 x 3+x 2+x+1 1111 15 x 13 x 3+x 2+1 1101 13 x 14 x 3+1 1001 9 x 15 1 0001 1
Addition/Subtraction over Binary Elements • XOR operation Binary elements 11 + 7 = 1011⊕ 0111 = 1100 = 12 GF(2 w) 11 + 7 = (x 3+x+1) + (x 2+x+1) = x 3+x 2 = 12
Multiplication/Division over Binary Elements (1/4) 1. Covert the binary words to their polynomial elements 2. Multiply/divide the polynomials modulo a primitive polynomial q(x) 3. Covert the result back to a binary element Binary elements GF(2 w) b 1 * b 2 = b 3 r 1(x) * r 2(x) = r 3(x)
Multiplication/Division over Binary Elements (2/4) Use two logarithm tables • gflog – Maps a binary element b to power j such that xj is equivalent to b • gfilog – Maps from a power j to its binary element b i gflog[ i ] gfilog[ i ] 0 1 1 0 2 2 1 4 3 4 8 4 2 3 5 8 6 6 7 8 5 10 3 12 11 5 … GF(24)
Multiplication/Division over Binary Elements (3/4) 1. Convert each binary element to its discrete logarithm – By looking up gflog 2. Add/Subtract the logarithms modulo 2 w-1 ※ x 2^w-1 = q(x) 3. Covert result back to a binary element – By looking up gfilog
Multiplication/Division over Binary Elements (4/4) Binary elements 3 * 7 = gfilog[gflog[3]+gflog[7]] = gfilog[4+10] =9 GF(2 w) 3 * 7 = (x+1) * (x 2+x+1) = x 4+10 = x 3+1 = 9
Summary of Algorithm 1. 2. 3. 4. 5. Choose w such that 2 w > n + m Set up the tables gflog and gfilog Set up the matrix F Calculate words of the checksum devices If any number of devices up to m fails, i. Choose any n of the remaining devices ii. Construct the matrix A’ and E’ iii. Solve for D in A’D = E’
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
An Example • Suppose n=3 and m=4
Step 1~3 • Choose w to be 4 ※ 2 w > n + m である必要がある • Set up gflog and gfilog • Set up the 3× 3 matrix F – Defined over GF(24) 10 F = 11 12 20 21 22 30 31 = 32 1 1 2 4 1 3 5
Step 4 • Calculate each word of the checksum devices using FD=C – d 1=3, d 2=13, d 3=9 c 1 = (1)(d 1) ⊕ (1)(d 2) ⊕ (1)(d 3) = 7 c 2 = (1)(d 1) ⊕ (2)(d 2) ⊕ (3)(d 3) = 2 c 3 = (1)(d 1) ⊕ (4)(d 2) ⊕ (5)(d 3) = 9
Step 5 • Change d 2 to 1 – D 2 send the value (1 -13) = (0001⊕ 1101) = 12 c 1 = 7 ⊕ (1)(12) = 11 c 2 = 2 ⊕ (2)(12) = 9 c 3 = 9 ⊕ (4)(12) = 12
Step 6 • D 2, D 3, and C 3 are lost AD = 1 0 0 1 1 1 0 1 2 4 0 0 1 1 3 5 3 1 D= 9 11 9 12 =E
Step 7 • D 2, D 3, and C 3 are lost A’D = 1 1 1 0 1 2 0 1 3 3 D = 11 = E’ 9
Step 7 • Recovery D = (A’)-1 E’ = 1 2 3 0 3 2 0 1 1 3 11 = 9 c 3 = (1)(3) ⊕ (4)(1) ⊕ (5)(9) = 12 3 1 9
Summary • Reed-Solomon Coding – Vandermonde matrix for checksum calculation – Gaussian Elimination for failure recovery – Arithmetic over Galois Fields ※大規模P 2 Pシステムに本当に適応可能?
Reference • A tutorial on Reed-Solomon Coding for Fault-tolerance in RAID-like Systems – James S. Plank – Software – Practice and Experience, Vol. 27(9), 996 -1012 (1997)
Outline • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields
Galois Fields • A field GF(n) is a set of n elements closed under addition and multiplication – Every element has an additive and multiplicative inverse • Except for the 0 element which has no multiplicative inverse
Examples of Galois Fields (1/2) • GF(2) = { 0, 1 } – Addition/multiplication are performed on modulo 2 • GF(n) = { 0, 1, …, n-1 } – where n is a prime number – Addition/multiplication are performed on modulo n ※ { 0, 1, 2, 3 } is not a Galois field – 2 has no multiplicative inverse
Examples of Galois Fields (2/2) • GF(2 w) = GF(2)[x]/q(x) – Elements are polynomials whose coefficients belong to GF(2) – Arithmetic module a primitive function q(x) • Degree of q(x) = w • Coefficients of q(x) belong to GF(2) E. g. ) GF(22) = GF(2)[x]/x 2+x+1 = { 0, 1, x, x+1 }
Implementation • RAID controller D 1 … Dn C 1 … Cm CPU • Distributed checkpoint system D 1 CPU … Dn C 1 CPU network … Cm CPU
Failure Model • Erasure model – When a device fails, it shutdowns – System recognizes this shutdown C. f. ) Error model – Device failure is manifested by storing/retrieving incorrect values
- Slides: 59