Hierarchical Pointer Analysis for Distributed Programs Amir Kamil
Hierarchical Pointer Analysis for Distributed Programs Amir Kamil and Katherine Yelick U. C. Berkeley August 23, 2007 Hierarchical Pointer Analysis 1 Amir Kamil
Background Hierarchical Pointer Analysis 2 Amir Kamil
Hierarchical Machines • Parallel machines often have hierarchical structure level 1 (thread local) A 4 level 2 1 (node local) 2 B C D 3 level 3 (cluster local) level 4 (grid world) Hierarchical Pointer Analysis 3 Amir Kamil
Partitioned Global Address Space • Partitioned global address space (PGAS) languages provide the illusion of shared memory across the machine • Wide pointers used to represent global addresses • Contain identifying information plus the physical address Process ID: 1 Address: 0 xf 9 a 0 cb 48 • Narrow pointers can still be used for addresses in the local physical address space Address: 0 xf 9 a 0 cb 48 Hierarchical Pointer Analysis 4 Amir Kamil
The Problems Hierarchical Pointer Analysis 5 Amir Kamil
Three Problems • What data is private to a thread? • What data is local to the physical address space? • What possible race conditions can occur? Hierarchical Pointer Analysis 6 Amir Kamil
Data Privacy • Data is private if it cannot leak beyond its source thread • Useful to know which data is private for global garbage collection, monitor optimization, and other applications Hierarchical Pointer Analysis 7 Amir Kamil
Data Locality • Recall: global pointers composed identifying information and an address Process ID: 1 Address: 0 xf 9 a 0 cb 48 • When dereferenced, runtime system must perform a check to determine if the data is actually in the local physical address space • If local, then access directly • If not local, then perform communication • Thus, global pointers are more costly in both space and time, even if the actual data is local Hierarchical Pointer Analysis 8 Amir Kamil
Race Detection • Shared memory introduces the possibility of race conditions • Two threads access the same memory location • The accesses can be simultaneous (no intermediate synchronization) • At least one access is a write Hierarchical Pointer Analysis 9 Amir Kamil
The Solution Hierarchical Pointer Analysis 10 Amir Kamil
Hierarchical Pointer Analysis • A pointer analysis that takes into account the machine hierarchy can answer the preceding questions • For each variable, we want to know not only from which allocation sites the data could have originated, but also from which threads Hierarchical Pointer Analysis 11 Amir Kamil
Related Work • Thread-aware pointer analysis has been done by others • Rugina and Rinard , Zhu and Hendren, Hicks, and others • None of them did it for hierarchical, distributed machines • Data privacy and locality detection previously done by Liblit, Aiken, and Yelick • Uses constraint propagation • Does not distinguish allocation sites Hierarchical Pointer Analysis 12 Amir Kamil
The Implementation Hierarchical Pointer Analysis 13 Amir Kamil
Titanium • Titanium is a single program, multiple data (SPMD) dialect of Java • All threads execute the same program text • Designed for distributed machines • Global address space – all threads can access all memory • At runtime, threads are grouped into processes • A thread shares a physical address space with some other, but not all threads Hierarchical Pointer Analysis 14 Amir Kamil
Titanium Memory Hierarchy • Global memory is composed of a hierarchy Program Processes Threads 0 1 global tlocal 2 3 plocal • Locations can be thread-local (tlocal), processlocal (plocal), or potentially in another process (global) Hierarchical Pointer Analysis 15 Amir Kamil
The Analysis Hierarchical Pointer Analysis 16 Amir Kamil
Approach • We define a small SPMD language based on Titanium • We produce a type system that accounts for the memory hierarchy • The analysis can handle an arbitrary number of levels, but we use three levels in this talk • We give an overview of the pointer analysis inference rules Hierarchical Pointer Analysis 17 Amir Kamil
Language Syntax • Types : : = int | refq • Qualifiers q : : = tlocal | plocal | global (tlocal @ plocal @ global) • Expressions e : : = newl (allocation) | transmit e 1 from e 2 (communication) | e 1 Ã e 2 (dereferencing assignment) | convert(e, n) (type conversion) Hierarchical Pointer Analysis 18 Amir Kamil
Type Rules – Allocation • The expression newl allocates space of type in local memory and returns a reference to the location • The label l is unique for each allocation site and will be used by the pointer analysis • The resulting reference is qualified with tlocal, since it references thread-local memory Thread 0 newl int tlocal Hierarchical Pointer Analysis ` newl : reftlocal 19 Amir Kamil
Type Rules – Communication • The expression transmit e 1 from e 2 evaluates e 1 on the thread given by e 2 and retrieves the result • If e 1 has reference type, the result type must be widened to global • Statically do not know source thread, so must assume it can be any thread `e : ` e : int Thread 0 y transmit y from 1 1 Thread 1 ` transmit e 1 from e 2 : expand( , global) tlocal global Hierarchical Pointer Analysis 2 expand(refq , q’) ´ reft(q, q’) expand( , q’) ´ otherwise 20 Amir Kamil
Type Rules – Dereferencing Assignment • The expression e 1 Ã e 2 puts the value of e 2 into the location referenced by e 1 (like *e 1 = e 2 in C) • Some assignments are unsound ` e 1 : refq Thread 0 y z tlocal ` e 2 : robust( , q) ` e 1 Ã e 2 : refq Thread 1 plocal tlocal robust(refq , q’) ´ false if q @ q’ robust( , q’) ´ true otherwise Hierarchical Pointer Analysis 21 Amir Kamil
Type Rules – Type Conversion • The expression convert(e, q) is an assertion that e refers to data that is no further than q • Titanium code often checks if data is plocal and then casts to it before operating on it for efficiency Thread 0 x ` e : refq’ global ` convert(e, q) : refq Hierarchical Pointer Analysis 22 Amir Kamil
Pointer Analysis • Since language is SPMD, analysis is only done for a single thread • We use thread 0 in our examples • Each expression has a points-to set of abstract locations that it can reference • Abstract locations also have points-to sets Hierarchical Pointer Analysis 23 Amir Kamil
Abstract Locations • Abstract locations consist of label and qualifier • A-loc (l, q) can refer to any concrete location allocated at label l that is at most distance q from thread 0 (l, tlocal) (l, plocal) Hierarchical Pointer Analysis Thread 0 Thread 1 newl int tlocal 24 tlocal Amir Kamil
Pointer Analysis – Allocation and Communication • The inference rules for allocation and communication are similar to the type rules • An allocation newl produces a new abstract location (l, tlocal) • The result of the expression transmit e 1 from e 2 is the set of a-locs resulting from e 1 but with global qualifiers e 1 ! {(l 1, tlocal), (l 2, plocal), (l 3, global)} transmit e 1 from e 2 ! {(l 1, global), (l 2, global), (l 3, global)} Hierarchical Pointer Analysis 25 Amir Kamil
Pointer Analysis – Dereferencing Assignment • For assignment, must take into account actions of other threads Thread 0 x (l 1, tlocal) Thread 1 x (l 2, tlocal) y x ! {(l 1, tlocal)}, (l 1, plocal) x (l 2, plocal) y (l 1, plocal) (l 2, plocal) y x à y : (l 1, tlocal) ! (l 2, plocal), (l 1, plocal) ! (l 2, plocal), y ! {(l 2, plocal)} Hierarchical Pointer Analysis Thread 2 (l 1, global) ! (l 2, global) 26 Amir Kamil
Pointer Analysis – Type Conversion • In the type conversion convert(e, q), the program is illegal if e evaluates to a location further than q • Thus, the result of the expression convert(e, q) is the set of a-locs resulting from e with the qualifiers reduced to at most q e ! {(l 1, tlocal), (l 2, plocal), (l 3, global)} convert(e, plocal) ! {(l 1, tlocal), (l 2, plocal), (l 3, plocal)} Hierarchical Pointer Analysis 27 Amir Kamil
Evaluation Hierarchical Pointer Analysis 28 Amir Kamil
Benchmarks • Five application benchmarks used to evaluate the pointer analysis Benchmark Line Count amr 7581 Adaptive mesh refinement suite gas 8841 Hyperbolic solver for a gas dynamics problem ft 1192 NAS Fourier transform benchmark cg 1595 NAS conjugate gradient benchmark mg 1952 NAS multigrid benchmark Hierarchical Pointer Analysis Description 29 Amir Kamil
Running Time • Determine actual cost of introducing multiple levels into the pointer analysis • Tests run on 2. 4 GHz Pentium 4 with 512 MB RAM • Three analysis variants compared Name Description PA 1 Single-level pointer analysis PA 2 Two-level pointer analysis (thread-local and global) PA 3 Three-level pointer analysis Hierarchical Pointer Analysis 30 Amir Kamil
Running Time Results Pointer Analysis Running Time 4 PA 1 PA 2 PA 3 3 2, 5 Good Analysis Time (seconds) 3, 5 2 1, 5 1 0, 5 0 amr gas ft cg mg Benchmark Hierarchical Pointer Analysis 31 Amir Kamil
Data Privacy Detection • In pointer analysis, an allocation site is private if only thread-local references to it are used • Thus, only two levels, thread-local and global, needed in the pointer analysis • Two types of analysis compared Name Description SQI Constraint-based analysis by Liblit, Aiken, and Yelick; does not distinguish allocation sites PA 2 Two-level pointer analysis (thread-local and global) Hierarchical Pointer Analysis 32 Amir Kamil
Data Privacy Detection Results Data Privacy Detection SQI PA 2 90 80 70 60 Good Percent Determined to be Thread-Private 100 50 40 30 20 10 0 amr gas ft cg mg Benchmark Hierarchical Pointer Analysis 33 Amir Kamil
Data Locality Detection • Goal: statically determine which pointers must be process-local • Three analyses compared Name Description LQI Constraint-based analysis by Liblit and Aiken; does not distinguish allocation sites PA 2 Two-level pointer analysis (thread-local and global) PA 3 Three-level pointer analysis Hierarchical Pointer Analysis 34 Amir Kamil
Data Locality Detection Results Data Locality Detection LQI PA 2 PA 3 90 80 70 60 Good Percent Determined to be Process-Local 100 50 40 30 20 10 0 amr gas ft cg mg Benchmark Hierarchical Pointer Analysis 35 Amir Kamil
Race Detection • Pointer analysis used with an existing concurrency analysis to detect potential races at compile-time • Three analyses compared Name concur Description Concurrency analysis plus constraint-based data sharing analysis and type-based alias analysis concur+PA 1 Concurrency analysis plus single-level pointer analysis concur+PA 3 Concurrency analysis plus three-level pointer analysis Hierarchical Pointer Analysis 36 Amir Kamil
Race Detection Results Static Races Detected 100000 concur+PA 1 concur+PA 3 10000 4082 3065 2029 1514 951 1000 793 517 286 Good Races (Logarithmic Scale) 11493 446 262 207 100 198 67 66 10 amr gas ft cg mg Benchmark Hierarchical Pointer Analysis 37 Amir Kamil
Conclusion Hierarchical Pointer Analysis 38 Amir Kamil
Conclusion • We developed a pointer analysis for hierarchical, distributed machines • The cost of introducing the memory hierarchy into the analysis is small • On the other hand, the payoff is large Hierarchical Pointer Analysis 39 Amir Kamil
Future Work • Scientific programs tend to use a lot of arraybased data structures • Need array index analysis to properly analyze them • Implement a dynamic race detector • Use static results to minimize the program locations that need to be tracked Hierarchical Pointer Analysis 40 Amir Kamil
Questions Hierarchical Pointer Analysis 41 Amir Kamil
- Slides: 41