Level by Level Making Flow and ContextSensitive Pointer
Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo Institute of Computing Technology, Chinese Academy of Sciences { htyu, zqzhang, fxb, huowei }@ict. ac. cn Jingling Xue University of New South Wales jingling@cse. unsw. edu. au 1
Outline • • • Introduction Framework Analyzing a Level Experiments Conclusion INSTITUTE OF COMPUTING TECHNOLOGY 2
Introduction • Motivation – Who needs flow- and context-sensitive (FSCS) pointer analysis ? • • Software checking tools Program understanding Parallelization tools Hardware synthesis – Existed methods cannot scale to large real programs – Aiming at millions of lines of C code INSTITUTE OF COMPUTING TECHNOLOGY 3
Improve scalability • For flow-sensitivity – Decreasing iterations in dataflow analysis – Saving space of points-to graph • For context-sensitivity – Summary-based – Low storage penalty – Low apply penalty INSTITUTE OF COMPUTING TECHNOLOGY 4
Idea • Level by Level analysis – Analyze the pointers in decreasing order of their points-to levels • Suppose int **q, *p, x; q has a level 2, p has a level 1 and x has a level 0. – Fast flow-sensitive analysis on full sparse SSA – Fast and accurate context-sensitive analysis using a full transfer function INSTITUTE OF COMPUTING TECHNOLOGY 5
Contribution • performs a full-sparse flow-sensitive pointer analysis using a flow-insensitive algorithm • performs a context-sensitive pointer analysis efficiently with precise full transfer function • yields a flow- and context-sensitive interprocedural may/must mod/ref on a compact SSA form • analyzes million lines of code in minutes, faster than the state-of-the art FSCS pointer analysis algorithms INSTITUTE OF COMPUTING TECHNOLOGY 6
Framework for points-to level from the highest to lowest Compute points-to level Bottom-up Evalute transfer functions Top-down Propagate points-to set incremental build call graph Figure 1. Level-by-level pointer analysis (Lev. PA). INSTITUTE OF COMPUTING TECHNOLOGY 7
Points-to level • Property 1. If a variable x is possibly pointed to by a pointer y, then ptl(x) ≤ ptl(y). • Property 2. If a variable y is possibly assigned to x, then ptl(x) = ptl(y). • Compute points-to level by a Unification-based pointer analysis INSTITUTE OF COMPUTING TECHNOLOGY 8
Example int o, t; main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x = &a; y = &b; L 4: foo(x, y); L 5: *b = 5; L 6: if ( … ) { x = &c; y = &e; } L 7: else { x = &d; y = &d; } L 8: c = &t; L 9: foo( x, y); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY void foo( int **p, int **q) { L 11: *p = *q; L 12: *q = &obj; } ptl(x, y, p, q) =2 ptl(a, b, c, d, e) =1 ptl(t, o) = 0 analyze first { x, y, p, q } then { a, b, c, d, e} last { t, o } 9
Bottom-up analyze level 2 void foo( int **p, int **q) { L 11: *p = *q; L 12: *q = &obj; } main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x = &a; y = &b; L 4: foo(x, y); L 5: *b = 5; L 6: if ( … ) { x = &c; y = &e; } L 7: else { x = &d; y = &d; } L 8: c = &t; L 9: foo( x, y); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY 10
Bottom-up analyze level 2 void foo( int **p, int **q) { L 11: *p 1 = *q 1; L 12: *q 1 = &obj; } • • p 1’s points-to depend on formal-in p q 1’s points-to depend on formal-in q main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x = &a; y = &b; L 4: foo(x, y); L 5: *b = 5; L 6: if ( … ) { x = &c; y = &e; } L 7: else { x = &d; y = &d; } L 8: c = &t; L 9: foo( x, y); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY 11
Bottom-up analyze level 2 void foo( int **p, int **q) { L 11: *p 1 = *q 1; L 12: *q 1 = &obj; } • • main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x 1 = &a; y 1 = &b; L 4: foo(x 1, y 1); L 5: *b = 5; L 6: if ( … ) { x 2 = &c; y 2 = &e; } L 7: else { x 3 = &d; y 3 = &d; } x 4=ϕ (x 2, x 3); y 4=ϕ (y 2, y 3) L 8: c = &t; L 9: foo( x 4, y 4); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY p 1’s points-to depend on formal-in p q 1’s points-to depend on formal-in q • • x 1 → { a } y 1 → { b } x 2 → { c } y 2 → { e } x 3 → { d } y 3 → { d } x 4 → { c, d } y 4 → { e, d } 12
Full-sparse Analysis • Achieve flow-sensitivity flow-insensitively – Regard each SSA name as a unique variable – Set constraint-based pointer analysis • Full sparse – Saving time – Saving space INSTITUTE OF COMPUTING TECHNOLOGY 13
Top-down analyze level 2 void foo( int **p, int **q) { L 11: *p = *q; L 12: *q = &obj; } main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x = &a; y = &b; L 4: foo(x, y); L 5: *b = 5; L 6: if ( … ) { x = &c; y = &e; } L 7: else { x = &d; y = &d; } L 8: c = &t; L 9: foo( x, y); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY main: Propagate to callsite L 4: foo. p → { a } foo. q → { b } L 9: foo. p → { c, d } foo. q → { d, e } • foo. p → { a, c, d } • foo. q → { b, d, e } 14
Top-down analyze level 2 void foo( int **p, int **q) { L 11: *p = *q; L 12: *q = &obj; } main() { L 1: int **x, **y; L 2: int *a, *b, *c, *d, *e; L 3: x = &a; y = &b; L 4: foo(x, y); L 5: *b = 5; L 6: if ( … ) { x = &c; y = &e; } L 7: else { x = &d; y = &d; } L 8: c = &t; L 9: foo( x, y); L 10: *e = 10; } INSTITUTE OF COMPUTING TECHNOLOGY foo: Expand pointer dereferences void foo( int **p, int **q) { μ(b, d, e) L 11: *p 1 = *q 1; χ(a, c, d) L 12: *q 1 = &obj; χ(b, d, e) } Merging calling contexts here 15
Context Condition • To be context-sensitive • Points-to relation ci – p � v (p→v ) , p must (may) point to v, p is a formal parameter. • Context Condition ℂ(c 1, …, ck) – a Boolean function consists of higher-level points-to relations • Context-sensitive μ and χ – μ(vi, ℂ(c 1, …, ck)) – vi+1=χ(vi, M, ℂ(c 1, …, ck)) • M ∈ {may, must}, indicates weak/strong update INSTITUTE OF COMPUTING TECHNOLOGY 16
Context-sensitive μ and χ void foo( int **p, int **q) { μ(b, q�b) μ(d, q→d) μ(e, q→e) L 11: *p 1 = *q 1; a=χ(a , must, p�a) c=χ(c , may, p→c) d=χ(d , may, p→d) L 12: *q 1 = &obj; b=χ(b , must, q�b) d=χ(d , may, q→d) e=χ(e , may, q→e) } INSTITUTE OF COMPUTING TECHNOLOGY 17
Bottom-up analyze level 1 void foo( int **p, int **q) { μ(b 1, q�b) μ(d 1, q→d) μ(e 1, q→e) L 11: *p 1 = *q 1; a 2=χ(a 1 , must, p�a) c 2=χ(c 1 , may, p→c) d 2=χ(d 1 , may, p→d) L 12: *q 1 = &obj; b 2=χ(b 1 , must, q�b) d 3=χ(d 2 , may, q→d) e 2=χ(e 1 , may, q→e) } INSTITUTE OF COMPUTING TECHNOLOGY 18
Points-to Set • Local Points-to Set – Loc (p) = { <v, ℂ(c 1, …, ck)> | ℂ(c 1, …, ck) is a context condition}. – p can point to v if and only if ℂ(c 1, …, ck) holds. – is computed explicitly during the bottom-up analysis. • Dependence Set – Dep(p) = { <q, ℂ(c 1, …, ck)> | q is a formal-in parameter of level lev and ℂ(c 1, …, ck) is a context condition – Ptr(p) includes Ptr(q) if and only if ℂ(c 1, …, ck) holds. INSTITUTE OF COMPUTING TECHNOLOGY 19
Transfer function • Trans(proc, v) – < Loc(v), Dep(v), ℂ(c 1, …, ck), M > • v is a formal-out parameter • ℂ(c 1, …, ck) is a context condition. – V can be modified at a callsite invoking proc only if ℂ(c 1, …, ck) holds at the callsite • M ∈ {may, must}, – indicates may/must mod effect • Trans(proc) – a set of all individual transfer functions Trans(proc, v). INSTITUTE OF COMPUTING TECHNOLOGY 20
Bottom-up analyze level 1 void foo( int **p, int **q) { μ(b 1, q�b) μ(d 1, q→d) μ(e 1, q→e) L 11: *p 1 = *q 1; a 2=χ(a 1 , must, p�a) c 2=χ(c 1 , may, p→c) d 2=χ(d 1 , may, p→d) L 12: *q 1 = &obj; b 2=χ(b 1 , must, q�b) d 3=χ(d 2 , may, q→d) e 2=χ(e 1 , may, q→e) } INSTITUTE OF COMPUTING TECHNOLOGY • Trans(foo, a) = < { }, { <b, q�b> , < d, q→d>, < e, q→e>} , p�a, must > • Trans(foo, c) = < { }, { <b, q�b> , < d, q→d>, < e, q→e>} , p→c, may > • Trans(foo, b) = < {< obj, q�b> }, { } , q�b, must > • Trans(foo, e) = < {< obj, q→e> }, { } , q→e, may > • Trans(foo, d) = < {< obj, q→d> }, { <b, p→d ∧ q�b> , < d, p→d>, < e, p→d ∧ q→e> } , p→d ∨ q→d, may > 21
Bottom-up analyze level 1 L 1: L 2: L 3: L 4: int obj, t; main() { int **x, **y; int *a, *b, *c, *d, *e; x 1 = &a; y 1 = &b; μ(b 1, true) foo(x 1 , y 1 ); a 2=χ(a 1 , must, true) b 2=χ(b 1 , must, true) at L 4, p �a holds, q �b holds INSTITUTE OF COMPUTING TECHNOLOGY L 5: L 6: L 7: *b 1 = 5; if ( … ) { x 2 = &c; y 2 = &e; } else { x 3 = &d; y 3 = &d; } x 4=ϕ (x 2, x 3) y 4=ϕ (y 2, y 3) L 8: c 1 = &t; μ(d 1, true) μ(e 1, true) L 9: foo(x 4 , y 4); c 2=χ(c 1, may , true) d 2=χ(d 1, may , true) e 2=χ(e 1, may , true) L 10: *e 1= 10; } at L 9, p → c, p → d holds, q → e, q → d holds, 22
BDD and context condition • Context conditions are implemented using BDD – Compactly represented – Boolean operations efficiently x 1 variable x 1 represents p→a 1 0 0 x 3 0 0 1 x 2 variable x 2 represents q→a 1 1 variable x 3 represents p→b BDD for ℂ = (p → a ∧ q → a) ∨ p → b if only p → b holds at a call site, we can write ℂ |x 1=0; x 2=0; x 3=1 to see whether C holds at the call site. INSTITUTE OF COMPUTING TECHNOLOGY 23
Experiment • Analyzes million lines of code in minutes • Faster than the state-of-the art FSCS pointer analysis algorithms. Benchmark KLOC Lev. PA Bootstrapping(PLDI’ 08) 64 bit 32 bit Icecast-2. 3. 1 22 2. 18 5. 73 29 sendmail 115 72. 63 143. 68 939 httpd 128 16. 32 35. 42 161 445. gombk 197 21. 37 40. 78 / wine-0. 9. 24 1905 502. 29 891. 16 / wireshark-1. 2. 2 2383 366. 63 845. 23 / Table 2. Performance (secs). INSTITUTE OF COMPUTING TECHNOLOGY 24
Conclusion • We present a scalable method for flow- and context-sensitive pointer analysis • Analyzes the pointers in a program level by level in terms of their points-to levels. – Fast flow-sensitive analysis on full sparse SSA form – Fast and accurate context-sensitive analysis using full transfer functions represented by BDD. • Can analyze million lines of C code in minutes, faster than the state-of-the-art methods. INSTITUTE OF COMPUTING TECHNOLOGY 25
Thanks INSTITUTE OF COMPUTING TECHNOLOGY 26
- Slides: 26