Program Analysis and Design Conformance Martin Rinard Laboratory
Program Analysis and Design Conformance Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
Research Overview Program Analysis • Commutativity Analysis for C++ Programs [PLDI 96] • Memory Disambiguation for Multithreaded C Programs • Pointer Analysis [PLDI 99] • Region Analysis [PPo. PP 99, PLDI 00] • Pointer and Escape Analysis for Multithreaded Java Programs [OOPSLA 99, PLDI 01, PPo. PP 01]
Research Overview Transformations • Automatic Parallelization • Object-Oriented Programs with Linked Data Structures [PLDI 96] • Divide and Conquer Programs [PPo. PP 99, PLDI 00] • Synchronization Optimizations • Lock Coarsening [POPL 97, PLDI 98] • Synchronization Elimination [OOPSLA 99] • Optimistic Synchronization Primitives [PPo. PP 97] • Memory Management Optimizations • Stack Allocation [OOPSLA 99, PLDI 01] • Per-Thread Heap Allocation
Research Overview Verifications of Safety Properties • • Data Race Freedom [PLDI 00] Array Bounds Checks [PLDI 00] Correctness of Region-Based Allocation [PPo. PP 01] Credible Compilation [RTRV 99] • Correctness of Dataflow Analysis Results • Correctness of Standard Compiler Optimizations
Talk Overview • Memory Disambiguation • Goal: Verify Data Race Freedom for Multithreaded Divide and Conquer Programs • Analyses: • Pointer Analysis • Accessed Region Analysis • Experience integrating information from the developer into the memory disambiguation analysis • Role Verification • Design Conformance
Basic Memory Disambiguation Problem *p = v; (write v into the memory location that p points to) What memory locations may *p=v access? Without Any Analysis: *p = v *p=v may access any location
Basic Memory Disambiguation Problem *p = v; (write v into the memory location that p points to) What memory location may *p=v access? With Analysis: *p=v may access this location *p = v *p=v does not access these memory locations ! *p=v may access this location
Static Memory Disambiguation Analyze the program to characterize the memory locations that statements in the program read and write Fundamental problem in program analysis with many applications
Application: Verify Data Race Freedom Program Does This *p = v 1; NOT This *p = v 1 *q = v 2 || *q = v 2;
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 1 2 3 4 5 6 7 8 Combine
Divide and Conquer Algorithms • Lots of Generated Concurrency • Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel • Combine Results in Parallel
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d, t, n/4); spawn sort(d+n/4, t+n/4, n/4); spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n);
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) Divide array into if (n > CUTOFF) { subarrays and spawn sort(d, t, n/4); recursively sort spawn sort(d+n/4, t+n/4, n/4); subarrays in spawn sort(d+2*(n/2), t+2*(n/2), n/4); parallel spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n);
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { Subproblems Identified spawn sort(d, t, n/4); Using Pointers Into spawn sort(d+n/4, t+n/4, n/4); Middle of Array spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 7 4 6 1 3 5 8 2 d d+n/4 d+n/2 d+3*(n/4)
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { Sorted Results spawn sort(d, t, n/4); Written Back Into spawn sort(d+n/4, t+n/4, n/4); Input Array spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 4 7 1 6 3 5 2 8 d d+n/4 d+n/2 d+3*(n/4)
“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d, t, n/4); spawn sort(d+n/4, t+n/4, n/4); spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 4 7 1 6 3 5 2 8 d 1 4 6 7 2 3 5 8 t t+n/2
“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d, t, n/4); spawn sort(d+n/4, t+n/4, n/4); spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 1 2 3 4 5 6 7 8 d 1 4 6 7 2 3 5 8 t t+n/2
“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d, t, n/4); spawn sort(d+n/4, t+n/4, n/4); spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 7 4 6 1 3 5 8 2 d d+n
“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d, t, n/4); spawn sort(d+n/4, t+n/4, n/4); spawn sort(d+2*(n/2), t+2*(n/2), n/4); spawn sort(d+3*(n/4), t+3*(n/4), n-3*(n/4)); sync; spawn merge(d, d+n/4, d+n/2, t); spawn merge(d+n/2, d+3*(n/4), d+n, t+n/2); sync; merge(t, t+n/2, t+n, d); } else insertion. Sort(d, d+n); 7 4 1 6 3 5 8 2 d d+n
What Do You Need To Know To Verify Data Race Freedom? Points-to Information (data blocks that pointers point into) Region Information (accessed regions within data blocks)
Information Needed To Verify Race Freedom d and t point to different memory blocks Calls to sort access disjoint parts of d and t Together, calls access [d, d+n-1] and [t, t+n-1] sort(d, t, n/4); d t d+n-1 t+n-1 sort(d+n/4, t+n/4, n/4); d t d+n-1 t+n-1 sort(d+n/2, t+n/2, n/4); d t d+n-1 t+n-1 sort(d+3*(n/4), t+3*(n/4), n-3*(n/4));
Information Needed To Verify Race Freedom d and t point to different memory blocks First two calls to merge access disjoint parts of d, t Together, calls access [d, d+n-1] and [t, t+n-1] merge(d, d+n/4, d+n/2, t); d t d+n-1 t+n-1 merge(d+n/2, d+3*(n/4), d+n, t+n/2); d t d+n-1 t+n-1 merge(t, t+n/2, t+n, d); d t d+n-1 t+n-1
Information Needed To Verify Race Freedom Calls to insertion. Sort access [d, d+n-1] insertion. Sort(d, d+n); d d+n-1
What Do You Need To Know To Verify Data Race Freedom? Points-to Information (d and t point to different data blocks) Symbolic Region Information (accessed regions within d and t blocks)
How Hard Is It To Figure These Things Out?
How Hard Is It For the Program Analysis To Figure These Things Out? Challenging
How Hard Is It For the Program Analysis To Figure These Things Out? void insertion. Sort(int *l, int *h) { int *p, *q, k; for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--) *(q+1) = *q; *(q+1) = k; } } Not immediately obvious that insertion. Sort(l, h) accesses [l, h-1]
How Hard Is It For the Program Analysis To Figure These Things Out? void merge(int *l 1, int*m, int *h 2, int *d) { int *h 1 = m; int *l 2 = m; while ((l 1 < h 1) && (l 2 < h 2)) if (*l 1 < *l 2) *d++ = *l 1++; else *d++ = *l 2++; while (l 1 < h 1) *d++ = *l 1++; while (l 2 < h 2) *d++ = *l 2++; } Not immediately obvious that merge(l, m, h, d) accesses [l, h-1] and [d, d+(h-l)-1]
Issues • Heavy Use of Pointers • Pointers into Middle of Arrays • Pointer Arithmetic • Pointer Comparison • Multiple Procedures • sort(int *d, int *t, n) • insertion. Sort(int *l, int *h) • merge(int *l, int *m, int *h, int *t) • Recursion • Multithreading
Pointer Analysis • For each program point, computes where each pointer may point e. g. “ p x before statement *p = 1” • Complications 1. Statically unbounded number of locations • recursive data structures (lists, trees) • dynamically allocated arrays 2. Multiple possible executions of the program • may create different dynamic data structures
Memory Abstraction Stack p Physical Memory q q head j i v r j p Abstract Memory Heap v i head r Allocation block for each variable declaration Allocation block for each memory allocation site
Memory Abstraction Stack p Physical Memory q q head j i v r j p Abstract Memory Heap v i head r Allocation block for each variable declaration Allocation block for each memory allocation site
Pointer Analysis Summary • Key Challenge for Multithreaded Programs: Analyzing interactions between threads • Solution: Interference Edges • Record edges generated by each thread • Captures effect of parallel threads on points-to information of other threads
What Pointer Analysis Gives Us • Disambiguation of Memory Accesses Via Pointers • Pointer-based loads and stores: use pointer analysis results to derive the allocation block that each pointer-based load or store statement accesses • MOD-REF or READ-WRITE SETS Analysis: • All loads and stores • Procedures: use the memory access information for loads and stores to compute the allocation blocks that each procedure accesses
Is This Information Enough?
Is This Information Enough? NO Necessary but not Sufficient Parallel Tasks Access (Disjoint) Regions of Same Allocated Block of Memory
Structure of Analysis Pointer Analysis Disambiguate Memory at the Granularity of Allocation Blocks Bounds Analysis Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure Region Analysis Symbolic Regions Accessed By Execution of Each Procedure Data Race Freedom Check that Parallel Threads Are Independent
Running Example – Array Increment void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); /* increment first half */ spawn f(p+n/2, n/2); /* increment second half */ sync; } else { /* base case: increment small array */ int i = 0; while (i < n) { *(p+i) += 1; i++; } }
Intra-procedural Bounds Analysis Pointer Analysis Bounds Analysis Region Analysis Data Race Detection Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure
Intraprocedural Bounds Analysis GOAL: For each pointer and array index variable at each program point, derive lower and upper bounds E. g. “ 0 i n-1 at statement *(p+i) += 1 ” • Bounds are symbolic expressions • variables represent initial values of parameters of enclosing procedure • bounds are combinations of variables • example expression for f(p, n): p+(n/2)-1
Intraprocedural Bounds Analysis What are upper and lower bounds for i at each program point in base case? int i = 0; while (i < n) { *(p+i) += 1; i++; }
Bounds Analysis, Step 1 Build control flow graph i=0 i<n *(p+i) += 1 i = i+1
Bounds Analysis, Step 2 Set up bounds at beginning of basic blocks l 1 i u 1 i=0 l 2 i u 2 i<n l 3 i u 3 *(p+i) += 1 i = i+1
Bounds Analysis, Step 3 Compute transfer functions l 1 i u 1 i=0 0 i 0 l 2 i u 2 i<n l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1
Bounds Analysis, Step 3 Compute transfer functions l 1 i u 1 i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2
Bounds Analysis, Step 4 Key Step: set up constraints for bounds l 1 i u 1 i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ]
Bounds Analysis, Step 4 Key Step: set up constraints for bounds l 1 i u 1 i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ]
Bounds Analysis, Step 4 Key Step: set up constraints for bounds l 1 i u 1 i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ]
Bounds Analysis, Step 4 Key Step: set up constraints for bounds - i + i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ]
Bounds Analysis, Step 4 Key Step: set up constraints for bounds - i + i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ]
Bounds Analysis, Step 4 Key Step: set up constraints for bounds - i + i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 Build Region Constraints [ 0, 0 ] [ l 2 , u 2 ] [ l 3+1, u 3+1 ] [ l 2 , u 2 ] [ l 2 , n-1 ] [ l 3 , u 3 ] n i u 2 Inequality Constraints l 2 0 l 2 l 3+1 l 3 l 2 0 u 2 u 3+1 u 2 n-1 u 3
Bounds Analysis, Step 5 Generate symbolic expressions for bounds Goal: express bounds in terms of parameters l 2 = c 1 p + c 2 n + c 3 l 3 = c 4 p + c 5 n + c 6 u 2 = c 7 p + c 8 n + c 9 u 3 = c 10 p + c 11 n + c 12
Bounds Analysis, Step 5 Generate symbolic expressions for bounds Goal: express bounds in terms of parameters l 2 = c 1 p + c 2 n + c 3 l 3 = c 4 p + c 5 n + c 6 u 2 = c 7 p + c 8 n + c 9 u 3 = c 10 p + c 11 n + c 12 l 2 0 l 2 l 3+1 l 3 l 2 0 u 2 u 3+1 u 2 n-1 u 3
Bounds Analysis, Step 6 Substitute expressions into constraints c 1 p + c 2 n + c 3 0 c 1 p + c 2 n + c 3 c 4 p + c 5 n + c 6 +1 c 4 p + c 5 n + c 6 c 1 p + c 2 n + c 3 0 c 7 p + c 8 n + c 9 c 10 p + c 11 n + c 12 +1 c 7 p + c 8 n + c 9 c 10 p + c 11 n + c 12
Bounds Analysis, Step 7 Reduce symbolic inequalities to linear inequalities c 1 p + c 2 n + c 3 c 4 p + c 5 n + c 6 if c 1 c 4, c 2 c 5, and c 3 c 6
Bounds Analysis, Step 8 Apply reduction and generate a linear program c 1 0 c 2 0 c 3 0 0 c 7 0 c 8 0 c 9 c 1 c 4 c 2 c 5 c 3 c 6+1 c 10 c 7 c 4 c 1 c 5 c 2 c 6 c 3 c 7 c 10 c 8 c 11 c 9 c 12 c 11 c 8 c 12+1 c 9
Bounds Analysis, Step 8 Apply reduction and generate a linear program c 1 0 c 2 0 c 3 0 0 c 7 0 c 8 c 1 c 4 c 2 c 5 c 3 c 6+1 c 10 c 7 c 4 c 1 c 5 c 2 c 6 c 3 c 7 c 10 c 8 c 11 c 9 c 12 c 11 c 8 c 12+1 c 9 Objective Function: max: (c 1 + • • • + c 6) - (c 7 + • • • + c 12) lower bounds 0 c 9 upper bounds
Bounds Analysis, Step 10 Solve linear program to extract bounds - i + i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 n i u 2 Solution c 1=0 c 2 =0 c 3 =0 c 4=0 c 5 =0 c 6 =0 c 7=0 c 8 =1 c 9 =0 c 10=0 c 11=1 c 12=-1
Bounds Analysis, Step 9 Solve linear program to extract bounds - i + i=0 0 i 0 l 2 i u 2 i<n l 2 i n-1 l 3 i u 3 *(p+i) += 1 l 3 i u 3 i = i+1 l 3+1 i u 3+1 Solution c 1=0 c 2 =0 c 3 =0 c 4=0 c 5 =0 c 6 =0 c 7=0 c 8 =1 c 9 =0 c 10=0 c 11=1 c 12=-1 n i u 2 Symbolic Bounds u 2 = n l 2 = 0 l 3 = 0 u 3 = n-1
Bounds Analysis, Step 10 Substitute bounds at each program point - i + i=0 0 i n i<n 0 i n-1 *(p+i) += 1 0 i n-1 i = i+1 1 i n Solution c 1=0 c 2 =0 c 3 =0 c 4=0 c 5 =0 c 6 =0 c 7=0 c 8 =1 c 9 =0 c 10=0 c 11=1 c 12=-1 n i n Symbolic Bounds u 2 = n l 2 = 0 l 3 = 0 u 3 = n-1
Access Regions Compute access regions at each load or store - i + i=0 0 i n i<n 0 i n-1 *(p+i) += 1 0 i n-1 i = i+1 1 i n Solution c 1=0 c 2 =0 c 3 =0 c 4=0 c 5 =0 c 6 =0 c 7=0 c 8 =1 c 9 =0 c 10=0 c 11=1 c 12=-1 n i n [p, p+n-1] Symbolic Bounds u 2 = n l 2 = 0 l 3 = 0 u 3 = n-1
Interprocedural Region Analysis Pointer Analysis Bounds Analysis Region Analysis Data Race Detection Symbolic Regions Accessed By Execution of Each Procedure
Interprocedural Region Analysis GOAL: Compute accessed regions of memory for each procedure E. g. “ f(p, n) accesses [p, p+n-1] ” • Same Approach • Set up target bounds of accessed regions • Build a constraint system to compute these bounds • Constraint System Accessed regions for a procedure must include: 1. Regions accessed by statements in the procedure 2. Regions accessed by invoked procedures
Region Analysis in Example void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ]
Region Analysis in Example void f(char *p, int n) f(p, n) accesses [ l(p, n), u(p, n) ] if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ]
Region Analysis in Example void f(char *p, int n) f(p, n) accesses [ l(p, n), u(p, n) ] if (n > CUTOFF) { spawn f(p, n/2); [ l(p, n/2), u(p, n/2) ] spawn f(p+n/2, n/2); [ l(p+n/2, n/2), u(p+n/2, n/2) ] sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ]
Derive Constraint System • Region constraints [ l(p, n/2), u(p, n/2) ] [ l(p, n), u(p, n) ]www [ l(p+n/2, n/2), u(p+n/2, n/2) ] [ l(p, n), u(p, n) ]www [ p, p+n-1 ] [ l(p, n), u(p, n) ]www • Reduce to inequalities between lower/upper bounds • Further reduce to a linear program and solve: l(p, n) = p u(p, n) = p+n-1 • Access region for f(p, n): [p, p+n-1]
Data Race Freedom Pointer Analysis Bounds Analysis Region Analysis Data Race Freedom Check that Parallel Threads Are Independent
Data Race Freedom • Dependence testing of two statements • Do accessed regions intersect? • Based on comparing upper and lower bounds of accessed regions • Absence of data races • Check that all the statements that execute in parallel are independent
Data Race Freedom void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } f(p, n) accesses [ p, p+n-1 ]
Data Race Freedom void f(char *p, int n) f(p, n) accesses [ p, p+n-1 ] if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n/2 -1 ] [ p+n/2, p+n-1 ]
Data Race Freedom void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } No data races !
Fundamental Property of the Analysis: No Fixed Point Computations • The analysis does not use fixed-point computations: • The problem is reduced to a linear program • The solution to the linear program directly gives the symbolic lower and upper bounds • Fixed-point approaches: • Termination is not guaranteed: analysis domain of symbolic expressions has infinite ascending chains • Use imprecise techniques to ensure termination: • Artificially truncate number of iterations • Use imprecise widening operators
Experience • Set of benchmark programs • Two versions of each benchmark • Sequential version written in C • Multithreaded version written in Cilk • Experiments: 1. Data Race Freedom for the multithreaded versions 2. Array Bounds Violation Detection for both sequential and multithreaded versions 3. Automatic Parallelization for the sequential version
Data Races and Array Bounds Violations Application Array Bounds Violations Data races (multithreaded) Array Bounds Violations (sequential) Quick. Sort NO NO NO Merge. Sort NO NO NO Block. Mul NO NO NO No. Temp. Mul NO NO NO LU NO NO NO Knapsack YES NO NO Heat NO NO NO
Parallel Performance Quicksort Mergesort Block. Mul No. Temp. Mul Heat LU
Summary • Sophisticated Memory Disambiguation Analysis • Points-to Information • Accessed Region Information • Automatic • Interprocedural • Handles Multithreaded Programs • Other Uses Besides Data Race Freedom • Bitwidth Analysis • Array-Bounds Check Elimination • Buffer Overrun Detection
Bigger Picture • Analysis has a very specific goal • Developer understands and cares about results • Points-to and region information is (implicitly) part of the interface of each procedure • Developer understands interfaces • Developer has expectations about analysis results • Analysis can identify serious programming errors • Developer expectations are implicit
Idea Enhance procedure interface to make points-to and region information explicit • Points-to language • Points-to graphs at entry and exit • Effect on points-to relationships • Region language • Symbolic specification of accessed regions • Developer provides information • Analysis verifies that it is correct, and that correctness implies data race freedom
Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; } }
Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; } } Contexts for f(p, q, n) p p q q entry exit
Verifying Points-to Information One (flow sensitive) analysis per context Contexts for f(p, q, n) {. . . } p p q q entry exit
Verifying Points-to Information Start with entry points-to graph Contexts for f(p, q, n) {. . . } p p p q q q p p q q entry exit
Verifying Points-to Information Analyze procedure Contexts for f(p, q, n) {. . . } p p p q q entry q p p q q exit
Verifying Points-to Information Analyze procedure Contexts for f(p, q, n) {. . . } p p q q p p p q q q entry exit
Verifying Points-to Information Check result against exit points-to graph Contexts for f(p, q, n) {. . . } p p q q p p p q q q entry exit
Verifying Points-to Information Similarly for other context Contexts for f(p, q, n) {. . . } p p q q entry exit
Verifying Points-to Information Start with entry points-to graph Contexts for f(p, q, n) {. . . } p p p q q q p p q q entry exit
Verifying Points-to Information Analyze procedure Contexts for f(p, q, n) {. . . } p p q q entry p q p p q q exit
Verifying Points-to Information Check result against exit points-to graph Contexts for f(p, q, n) {. . . } p p q q p p p q q q entry exit
Analysis of Call Statements g(r, n) {. . f(r, s, n); . . }
Analysis of Call Statements Analysis produces points-graph before call g(r, n) {. r. s f(r, s, n); . . }
Analysis of Call Statements Retrieve declared contexts from callee g(r, n) {. r. s f(r, s, n); . . } Contexts for f(p, q, n) p p q q entry exit
Analysis of Call Statements Find context with matching entry graph g(r, n) {. r. s f(r, s, n); . . } Contexts for f(p, q, n) p p q q entry exit
Analysis of Call Statements Find context with matching entry graph g(r, n) {. r. s f(r, s, n); . . } Contexts for f(p, q, n) p p q q entry exit
Analysis of Call Statements Apply corresponding exit points-to graph g(r, n) {. r. s f(r, s, n); r. s. } Contexts for f(p, q, n) p p q q entry exit
Analysis of Call Statements Continue analysis after call g(r, n) {. . f(r, s, n); r. s. }
Analysis of Call Statements g(r, n) {. . f(r, s, n); r. s. } Result • Points-to declarations separate analysis of multiple procedures • Transformed • global, whole-program analysis into • local analysis that operates on each procedure independently
Experience • Implemented points-to and region languages • Integrated with points-to and region analyses • Divide and Conquer Benchmarks • Quicksort (QS) Sorting Programs • Mergesort (MS) • Matrix multiply (MM) Dense Matrix Computations • LU decomposition (LU) Scientific • Heat (H) Computation • We added points-to and region information
Programming Overhead Proportion of C Code, Region Declarations, and Points-to Declarations 1. 00 0. 75 C Code 0. 50 Region Declarations 0. 25 Points-to Declarations 0. 00 QS MS MM LU H
Evaluation How difficult is it to provide declarations? Not that difficult. • Have to write comparatively little code • Must know information anyway How much benefit does analysis obtain? Substantial benefit. • Simpler analysis software (no complex interprocedural analysis) • More scalable, precise analysis
Evaluation Software Engineering Benefits of Points-to and Region Declarations • Improved communication between developer and analysis • Analysis reflects developer’s expectations • Enhanced code reliability • Enhanced interface information • Analyze incomplete programs • Programs that use libraries • Programs under development
Evaluation Drawbacks of Points-to and Region Declarations • Have to learn new language • Have to integrate into development process • Legacy software issues (programmer may not know points-to and region information)
Steps to Design Conformance Verify that Program Correctly Implements Key Design Properties as Expressed by Developer or Designer • Role Verification • Design Conformance for Object Models (joint with Daniel Jackson, MIT LCS) • Context: Air Traffic Control Software • MIT LCS (Daniel Jackson, Martin Rinard) MIT Aero-Astro Department (R. John Hansman) NASA Ames Research Center (Michelle Eshow) Kansas State University CS Dept. (David Schmidt) • CTAS (Center/TRACON Automation System)
Role Verification • Objects play different roles during their lifetime in computation • Parked Aircraft, Taxiing Aircraft, Cleared for Takeoff Aircraft, In Flight Aircraft • Roles reflect constraints on activities of object • System actions must respect role constraints • Parked Aircraft can’t take off • Action violations indicate system confusion • Goals • Obtain role information from developer • Check that program uses roles correctly
Role Classification Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Class Flying Aircraft Roles • Two General Kinds of Classification • Content-based (predicate on object fields determines role) • Relative (points -to relationships determine role) • Role Classification is Application Dependent
Standard View of Object Incoming References Fields Flight Plan • Trajectory • Flight Name • Runway Gate Outgoing References List of Meter Fixes Sequence Of Points String Runway Object Gate Object
Relative Role Classification Points-to relationships define roles • Specify sources of incoming edges • Field of an object playing a given role • Global or local variable • Specify target of outgoing edges • Specify available fields in each role
Example Roles Parked Aircraft Flight Plan Trajectory Flight Name Runway Gate Object Aircraft • • •
Cleared for Takeoff Aircraft Example Roles Flight Plan Trajectory Flight Name Runway Gate List of Meter Fixes String Runway Object Aircraft • • •
Role Verification • Analysis Obtains • Role Definitions • Method Information • Roles of parameters and globals on entry • Role changes that method performs • Role of return value • Intraprocedural Analysis • Simulates potential executions of method • Precise abstraction of heap • Use role information for invoked methods • Verify correctness of role information
Benefits of Roles • Software Engineering Benefits • Safety checks that take application semantics into account • Enhanced implementation transparency • Transformations Enabled By Precise Referencing Behavior • Safe real-time memory management • Parallelization and race detection for Programs with linked data structures • Optimized Atomic Transactions
Key Issue: Obtaining Role Information • Range of Developer and Designer Involvement • Some Involvement Reasonable and Necessary: Roles Reflect Application-Specific Properties • Primary Focus: Role Definitions • Determine analysis distinctions • Relevance of extracted information • Secondary Focus: Method Specifications • Developer specifies roles of parameters • Analysis extracts role changes
Design Conformance • Software Development Activities • Requirements • Design • Implementation • Design is Partial • Focus on Important Aspects • Omit Many Low-Level Details • Design and Implementation are Disconnected • No guarantee that code conforms to design
Goal of Design Conformance • Establish and mechanically check conformance • Use specific design formalism (object models) • Boxes (objects) and Arrows (relations between objects) Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Flight Plan + Flying Aircraft Meter Fix Flight Plan +
Key Issue • Establishing correspondence between object model and implementation • Object models usually at a higher level of abstraction • Many relations in object model realized as group of objects and references • Object model may entirely omit some objects or references • Enables designer to focus on important aspects • But complicates path to conformance analysis
Aircraft Abstract Object Model Flight Plan + Meter Fix Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Flight Plan + Intermediate Object Model Flying Aircraft Meter Flight Plan Fix + Concrete Object Model Flight Plan Trajectory Gate Object Flight Name Aircraft Runway • Gate • • Flight Name List of Meter Fixes String Runway Object Aircraft • • • Roles
Concretization Specifications • Maps Between Object Models • Enables Designer/Developer to Establish Correspondence Between Object Models • Specify how Object Model is Realized in Code • Foundation for design conformance analysis • Guides implementation of object model • Implementation patterns for object models
Design Conformance Benefits • Higher Confidence in Software • Promote clean implementation of design • Guarantee important design properties • Design becomes useful throughout entire development cycle • Updated as implementation changes • Reliable source of information • Enables more precise, relevant analysis
Related Work • Pointer Analysis • Landi, Ryder, Zhang – PLDI 93 • Emami, Ghiya, Hendren – PLDI 94 • Wilson, Lam – PLDI 96 • Rugina, Rinard – PLDI 99 • Rountev, Ryder – CC 01 • Salcianu, Rinard – PPo. PP 01 • Region Analysis • Triolet, Irigoin, Feautrier- PLDI 86 • Havlak, Kennedy – IEEE TPDS 91 • Rugina, Rinard – PLDI 00 • Pointer Specifications • Hendren, Hummel, Nicolau – PLDI 92 • Guyer, Lin – LCPC 00
Related Work • Shape Analysis [CWZ 90, GH 96, FL 97, SRW 99, MS 01] • Extended Type Systems • FX/87 [GJLS 87] • Dependent Types [XF 99] • Program Verification • ESC [DLNS 98] • PVS [ORRSS 96] • Implementations of Object Models [HBR 00]
Conclusion • Developer and Designer Interact with Analysis • Benefits • More precise, relevant analysis • Verify key safety and design properties • Enhance utility of design • Enable powerful transformations • Key Issue: • Determining appropriate abstractions to leverage • Access regions, roles, object models • Abstractions Share Several Features • Identify important properties of data • Relate properties of data to behavior of computation
- Slides: 128