Symbolic Analysis Symbolic Analysis n n Symbolic analysis

Symbolic Analysis

Symbolic Analysis n n Symbolic analysis tracks the values of variables in programs symbolically as expressions of input variables and other variables, which we call reference variables. We may draw out useful information about relationships among variables that are expressed in terms of the same set of reference variables

An Example 1) 2) 3) 4) 5) 6) 7) x = input(); y = x – 1; z = y – 1; A[x] = 10; A[y] = 11; if (z > x) z = x; z=x– 2 &A[x] &A[y] z > x is never true can be removed

Abstract Domain n Since we cannot create succinct and closedform symbolic expressions for all values computed, we choose an abstract domain and approximate the computations with the most precise expressions within the domain. Constant propagation: { constants, UNDEF, NAC } Symbolic analysis: { affine-expressions, NAA }

Affine Expressions n n An expression is affine with respect to variables v 1, v 2, …, vn if it can be expressed as c 0 + c 1 v 1 + … + cnvn, where c 0, c 1, …, cn are constants. An affine expression is linear if c 0 is zero.

Induction Variables n n An affine expression can also be written in terms of the count of iterations through the loop. Variables whose values can be expressed as c 1 i + c 0, where i is the count of iterations through the closest enclosing loop, are known as induction variables.

An Example for (m = 10; m < 20; m++) { x = m * 3; A[x] = 0; } i, m = i + 10 x = 30 + 3 * i x = 27; for (m = 10; m < 20; m++) { x = x + 3; A[x] = 0; } for (x = &A + 30; x <= &A + 57; x = x + 3) { *x = 0; }

Other Reference Variables n If a variable is not a linear function of the reference variables already chosen, we have the option of treating its value as reference for future operations. a = f(); b = a + 10; c = a + 11;

A Running Example a = 0; i = 1; a = 0; for (f = 100; f < 200; f++) { a = a + 1; b = 10 * a; c = 0; for (g = 10; g < 20; g++) { d = b + c; c = c + 1; } } a = a + 1; b = 10 * a; c = 0; j = 1; R 8 B 1 B 2 B 3 d = b + c; c = c + 1; j = j + 1; if j <= 10 goto B 3 B 4 i = i + 1; if i <= 100 goto B 2 R 6 R 5 R 7

Data-Flow Values: Symbolic Maps n n n The domain of data-flow values for symbolic analysis is symbolic maps, which are functions that map each variable in the program to a value. The value is either an affine function of reference values, or the special symbol NAA to represent a non-affine expression. If there is only one variable, the bottom value of the semilattice is a map that sends the variable to NAA. The semilattice for n variables is the product of the individual semillatices. We use m. NAA to denote the bottom of the semilattice which maps all variables to NAA.

The Running Example a = 0; i = 1; a = a + 1; b = 10 * a; c = 0; j = 1; R 8 B 1 B 2 B 3 d = b + c; c = c + 1; j = j + 1; if j <= 10 goto B 3 B 4 i = i + 1; if i <= 100 goto B 2 R 6 R 5 R 7 var i = 1 1 i 100 j = 1, …, 10 a b d c 1 10 10, …, 19 1, …, 10 i 10 i, …, 10 i + 9 1, …, 10

The Running Example m m(a) IN[B 1] NAA OUT[B 1] 0 IN[B 2] i– 1 OUT[B 2] i IN[B 3] i OUT[B 3] i IN[B 4] i OUT[B 4] i – 1 m(b) m(c) NAA NAA NAA 10 i 0 10 i j– 1 10 i j 10 i – 10 j m(d) NAA NAA NAA 10 i + j – 11

The Running Example a = 0; for (i = 1; i <= 100; i++) { a = i; b = 10 * i; c = 0; for (j = 1; j <= 10; j++) { d = 10 * i + j – 1; c = j; } }

Transfer Functions n n The transfer functions in symbolic analysis send symbolic maps to symbolic maps. The transfer function of statement s, denoted fs, is defined as follows: If s is not an assignment, then fs = I. If s is an assignment to variable x, then fs(m)(x) m(v) for all variables v x, = c 0+c 1 m(y)+c 2 m(z) if x is assigned c 0+c 1 y+c 2 z, NAA otherwise.

Composition of Transfer Functions n n If f 2(m)(v) = NAA, then (f 2。f 1)(m)(v) = NAA. If f 2(m)(v) = c 0 + i cim(vi), then (f 2。f 1)(m)(v) NAA, if f 1(m)(vi) = NAA for some i 0, = ci 0 c 0 + i ci f 1(m)(vi) otherwise

The Running Example f f B 1 f B 2 f B 3 f B 4 f(m)(a) f(m)(b) 0 m(b) m(a) + 1 10 m(a) + 10 m(a) m(b) f(m)(c) m(c) 0 m(c) + 1 m(c) f(m)(d) m(d) m(b) + m(c) m(d)

Solutions to Data-Flow Problem OUT[Bk] = f. B(IN[Bk]), for all Bk OUT[B 1] IN 1[B 2] OUT[B 2] INi, 1[B 3], 1 i 100 OUTi, j-1[B 3] INi, j[B 3], 1 i 100, 2 j 10 OUTi, 10[B 3] INi[B 4], 2 i 100 OUTi-1[B 4] INi[B 2], 1 i 100

Meet of Transfer Functions n The meet of two transfer functions: (f 2 f 1)(m)(v) = f 1(m)(v) if f 1(m)(v) = f 2(m)(v) NAA otherwise

Parameterized Function Compositions n n If f(m)(x) = m(x) + c, then f i(m)(x) = m(x) + ci for all i 0, x is a basic induction variable. If f(m)(x) = m(x), then f i(m) (x) = m(x) for all i 0, x is a symbolic constant. If f(m)(x) = c 0 + c 1 m(x 1) + … + cnm(xn), where each xk is either a basic induction variable or a symbolic constant , then f i(m)(x) = c 0 + c 1 f i(m)(x 1) + … + cn f i(m)(xn) for all i 0 , x is an induction variable. In all other cases, f i(m)(x) = NAA.

Parameterized Function Compositions n n The effect of executing a fixed number of iterations is obtained by replacing i above by that number. If the number of iterations is unknown, the value at the start of the last iteration is given by f *. m(v) if f(m)(v) = m(v) f *(m)(v) = NAA otherwise

The Running Example n f i. B 3(m)(v) = n f *B 3(m)(v) = m(a) m(b) if v = a if v = b m(c) + i m(b) + m(c) + i m(a) m(b) if v = c if v = d. if v = a if v = b NAA if v = c if v = d.

A Region-Based Algorithm n n The effect of execution from the start of the loop region to the entry of the ith iteration f. R, i, IN[S] = ( B pred(S) f. S, OUT[B])i-1 If the number of iterations of a region is known, replace i with the actual count. In the top-down pass, compute f. R, i, IN[B]. If m(v) = NAA, introduce a new reference variable t, all references of m(v) are placed by t.

The Running Example f. R 5, j, IN[B 3] = f j-1 B 3 f. R 5, j, OUT[B 3] = f j. B 3 f. R 6, IN[B 2] = I f. R 6, IN[R 5] = f. B 2 f. R 6, OUT[B 4] = I。 f. R 5, 10, OUT[B 3] 。 f. B 2 f. R 7, i, IN[R 6] = f i-1 R 6, OUT[B 4] f. R 7, i, OUT[B 4] = f i. R 6, OUT[B 4] f. R 8, IN[B 1] = I f. R 8, IN[R 7] = f. B 1 f. R 8, OUT[B 4] = I。 f. R 7, 100, OUT[B 4] 。 f. B 1

The Running Example f f. R 5, j, IN[B 3] f. R 5, j, OUT[B 3] f(m)(a) m(a) f(m)(b) m(b) f. R 6, IN[B 2] f. R 6, IN[R 5] f. R 6, OUT[B 4] f. R 7, i, IN[R 6] f. R 7, i, OUT[B 4] f. R 8, IN[B 1] f. R 8, IN[R 7] f. R 8, OUT[B 4] m(a) m(b) m(a)+1 10 m(a)+10 m(a)+i-1 NAA m(a)+i 10 m(a)+10 i m(a) m(b) 0 m(b) 1000 f(m)(c) f(m)(d) m(c)+j-1 NAA m(c)+j m(b)+m(c)+j-1 m(c) 0 10 NAA 10 m(c) 10 m(d) 10 m(a)+9 NAA 10 m(a)+10 i+9 m(d) 1000

The Running Example IN[B 1] = m. NAA OUT[B 1] = f. B 1(IN[B 1]) INi[B 2] = f. R 7, i, IN[R 6] (OUT[B 1]) OUTi[B 2] = f. B 2(INi[B 2]) INi, j[B 3] = f. R 5, j, IN[B 3] (OUTi[B 2]) OUTi, j[B 3] = f. B 2(INi, j[B 3])

The Running Example for (i = 1; i < n; i++) { a = input(); for (j = 1; j < 10; j++) { a = a – 1; b = j + a; a = a + 1; } } for (i = 1; i < n; i++) { a = input(); t = a; for (j = 1; j < 10; j++) { a = t – 1; b = t – 1 + j; a = t; } }