Pin ASPLOS Tutorial Kim Hazelwood Vijay Janapa Reddi

  • Slides: 114
Download presentation
Pin ASPLOS Tutorial Kim Hazelwood Vijay Janapa Reddi

Pin ASPLOS Tutorial Kim Hazelwood Vijay Janapa Reddi

About Us Kim Hazelwood – Assistant Professor at University of Virginia – Tortola Research

About Us Kim Hazelwood – Assistant Professor at University of Virginia – Tortola Research Group: HW/SW Collaboration, Virtualization Vijay Janapa Reddi – Ph. D. Student at Harvard University – VM Optimizations, VM Scalability 1 Pin ASPLOS Tutorial 2008

Agenda I. Pin Intro and Overview II. Using Pin in Your Research III. Hands-On

Agenda I. Pin Intro and Overview II. Using Pin in Your Research III. Hands-On Workshop 2 Pin ASPLOS Tutorial 2008

Part One: Introduction and Overview Kim Hazelwood Vijay Janapa Reddi

Part One: Introduction and Overview Kim Hazelwood Vijay Janapa Reddi

What is Instrumentation? A technique that inserts extra code into a program to collect

What is Instrumentation? A technique that inserts extra code into a program to collect runtime information Instrumentation approaches: • Source instrumentation: – Instrument source programs • Binary instrumentation: – Instrument executables directly 4 Pin ASPLOS Tutorial 2008

Why use Dynamic Instrumentation? ü No need to recompile or relink ü Discover code

Why use Dynamic Instrumentation? ü No need to recompile or relink ü Discover code at runtime ü Handle dynamically-generated code ü Attach to running processes 5 Pin ASPLOS Tutorial 2008

How is Instrumentation used in Computer Architecture Research? • Trace Generation • Branch Predictor

How is Instrumentation used in Computer Architecture Research? • Trace Generation • Branch Predictor and Cache Modeling • Fault Tolerance Studies • Emulating Speculation • Emulating New Instructions 6 Pin ASPLOS Tutorial 2008

How is Instrumentation used in PL/Compiler Research? Program analysis – Code coverage – Call-graph

How is Instrumentation used in PL/Compiler Research? Program analysis – Code coverage – Call-graph generation – Memory-leak detection – Instruction profiling Thread analysis – Thread profiling – Race detection 7 Pin ASPLOS Tutorial 2008

Advantages of Pin Instrumentation Easy-to-use Instrumentation: • Uses dynamic instrumentation – Do not need

Advantages of Pin Instrumentation Easy-to-use Instrumentation: • Uses dynamic instrumentation – Do not need source code, recompilation, post-linking Programmable Instrumentation: • Provides rich APIs to write in C/C++ your own instrumentation tools (called Pintools) Multiplatform: • • Supports x 86, x 86 -64, Itanium, Xscale Supports Linux, Windows, Mac. OS Robust: • • • Instruments real-life applications: Database, web browsers, … Instruments multithreaded applications Supports signals Efficient: • 8 Applies compiler optimizations on instrumentation code Pin ASPLOS Tutorial 2008

Other Advantages • Robust and stable – – – Pin can run itself! Several

Other Advantages • Robust and stable – – – Pin can run itself! Several active developers Nightly testing of 25000 binaries on 15 platforms Large user base in academia and industry Active mailing list (Pinheads) • 20, 000 downloads 9 Pin ASPLOS Tutorial 2008

Using Pin Launch and instrument an application $ pin –t pintool –- application Instrumentation

Using Pin Launch and instrument an application $ pin –t pintool –- application Instrumentation engine (provided in the kit) Instrumentation tool (write your own, or use one provided in the kit) Attach to and instrument an application $ pin –t pintool –pid 1234 10 Pin ASPLOS Tutorial 2008

Pin Instrumentation APIs Basic APIs are architecture independent: • Provide common functionalities like determining:

Pin Instrumentation APIs Basic APIs are architecture independent: • Provide common functionalities like determining: – Control-flow changes – Memory accesses Architecture-specific APIs • e. g. , Info about segmentation registers on IA 32 Call-based APIs: • Instrumentation routines • Analysis routines 11 Pin ASPLOS Tutorial 2008

Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation

Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation is inserted • e. g. , before instruction C Occurs first time an instruction is executed Analysis routines define what to do when instrumentation is activated • e. g. , increment counter C Occurs every time an instruction is executed 12 Pin ASPLOS Tutorial 2008

Pintool 1: Instruction Count sub $0 xff, %edx counter++; cmp %esi, %edx counter++; jle

Pintool 1: Instruction Count sub $0 xff, %edx counter++; cmp %esi, %edx counter++; jle <L 1> counter++; mov $0 x 1, %edi counter++; add $0 x 10, %eax counter++; 13 Pin ASPLOS Tutorial 2008

Pintool 1: Instruction Count Output $ /bin/ls Makefile imageload. out itrace proccount imageload inscount

Pintool 1: Instruction Count Output $ /bin/ls Makefile imageload. out itrace proccount imageload inscount 0 atrace itrace. out $ pin -t inscount 0. so -- /bin/ls Makefile imageload. out itrace proccount imageload inscount 0 atrace itrace. out Count 422838 14 Pin ASPLOS Tutorial 2008

#include <iostream> #include "pin. h" Manual. Examples/inscount 0. cpp UINT 64 icount = 0;

#include <iostream> #include "pin. h" Manual. Examples/inscount 0. cpp UINT 64 icount = 0; void docount() { icount++; } analysis routine void Instruction(INS ins, void *v) instrumentation routine { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT 32 code, void *v) { std: : cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_Add. Instrument. Function(Instruction, 0); PIN_Add. Fini. Function(Fini, 0); PIN_Start. Program(); return 0; } 15 Pin ASPLOS Tutorial 2008

Pintool 2: Instruction Trace Print(ip); sub $0 xff, %edx Print(ip); cmp %esi, %edx Print(ip);

Pintool 2: Instruction Trace Print(ip); sub $0 xff, %edx Print(ip); cmp %esi, %edx Print(ip); jle <L 1> Print(ip); mov $0 x 1, %edi Print(ip); add $0 x 10, %eax Need to pass ip argument to the analysis routine (printip()) 16 Pin ASPLOS Tutorial 2008

Pintool 2: Instruction Trace Output $ pin -t itrace. so -- /bin/ls Makefile imageload.

Pintool 2: Instruction Trace Output $ pin -t itrace. so -- /bin/ls Makefile imageload. out itrace proccount imageload inscount 0 atrace itrace. out $ head -4 itrace. out 0 x 40001 e 90 0 x 40001 e 91 0 x 40001 ee 4 0 x 40001 ee 5 17 Pin ASPLOS Tutorial 2008

Manual. Examples/itrace. cpp #include <stdio. h> #include "pin. H" argument to analysis routine FILE

Manual. Examples/itrace. cpp #include <stdio. h> #include "pin. H" argument to analysis routine FILE * trace; void printip(void *ip) { fprintf(trace, "%pn", ip); } analysis routine instrumentation routine void Instruction(INS ins, void *v) { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } void Fini(INT 32 code, void *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("itrace. out", "w"); PIN_Init(argc, argv); INS_Add. Instrument. Function(Instruction, 0); PIN_Add. Fini. Function(Fini, 0); PIN_Start. Program(); return 0; } 18 Pin ASPLOS Tutorial 2008

Examples of Arguments to Analysis Routine IARG_INST_PTR • Instruction pointer (program counter) value IARG_UINT

Examples of Arguments to Analysis Routine IARG_INST_PTR • Instruction pointer (program counter) value IARG_UINT 32 <value> • An integer value IARG_REG_VALUE <register name> • Value of the register specified IARG_BRANCH_TARGET_ADDR • Target address of the branch instrumented IARG_MEMORY_READ_EA • Effective address of a memory read And many more … (refer to the Pin manual for details) 19 Pin ASPLOS Tutorial 2008

Instrumentation Points Instrument points relative to an instruction: • Before (IPOINT_BEFORE) • After: –

Instrumentation Points Instrument points relative to an instruction: • Before (IPOINT_BEFORE) • After: – Fall-through edge (IPOINT_AFTER) – Taken edge (IPOINT_TAKEN_BRANCH) count() 20 cmp jle mov %esi, %edx count() <L 1>: mov $0 x 8, %edi $0 x 1, %edi Pin ASPLOS Tutorial 2008

Instrumentation Granularity Instrumentation can be done at three different granularities: • Instruction • Basic

Instrumentation Granularity Instrumentation can be done at three different granularities: • Instruction • Basic block sub $0 xff, %edx – A sequence of instructions cmp %esi, %edx terminated at a control-flow changing instruction jle <L 1> – Single entry, single exit • Trace mov $0 x 1, %edi – A sequence of basic blocks add $0 x 10, %eax terminated at an jmp <L 2> unconditional control-flow 1 Trace, 2 BBs, 6 insts changing instruction – Single entry, multiple exits 21 Pin ASPLOS Tutorial 2008

Recap of Pintool 1: Instruction Count counter++; sub $0 xff, %edx counter++; cmp %esi,

Recap of Pintool 1: Instruction Count counter++; sub $0 xff, %edx counter++; cmp %esi, %edx counter++; jle <L 1> counter++; mov $0 x 1, %edi counter++; add $0 x 10, %eax Straightforward, but the counting can be more efficient 22 Pin ASPLOS Tutorial 2008

Pintool 3: Faster Instruction Count counter += 3 sub $0 xff, %edx cmp %esi,

Pintool 3: Faster Instruction Count counter += 3 sub $0 xff, %edx cmp %esi, %edx jle <L 1> counter += 2 mov $0 x 1, %edi add 23 $0 x 10, %eax Pin ASPLOS Tutorial 2008 basic blocks (bbl)

Manual. Examples/inscount 1. cpp #include <stdio. h> #include "pin. H“ UINT 64 icount =

Manual. Examples/inscount 1. cpp #include <stdio. h> #include "pin. H“ UINT 64 icount = 0; analysis routine void docount(INT 32 c) { icount += c; } void Trace(TRACE trace, void *v) { instrumentation routine for (BBL bbl = TRACE_Bbl. Head(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_Insert. Call(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT 32, BBL_Num. Ins(bbl), IARG_END); } } void Fini(INT 32 code, void *v) { fprintf(stderr, "Count %lldn", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_Add. Instrument. Function(Trace, 0); PIN_Add. Fini. Function(Fini, 0); PIN_Start. Program(); return 0; } 24 Pin ASPLOS Tutorial 2008

Modifying Program Behavior Pin allows you not only to observe but also change program

Modifying Program Behavior Pin allows you not only to observe but also change program behavior Ways to change program behavior: • Add/delete instructions • Change register values • Change memory values • Change control flow 25 Pin ASPLOS Tutorial 2008

Instrumentation Library #include <iostream> #include "pin. H" UINT 64 icount = 0; Instruction counting

Instrumentation Library #include <iostream> #include "pin. H" UINT 64 icount = 0; Instruction counting Pin Tool #include <iostream> #include "pin. H" #include "instlib. H" VOID Fini(INT 32 code, VOID *v) { std: : cerr << "Count " << icount << endl; INSTLIB: : ICOUNT icount; } VOID docount() { icount++; } VOID Fini(INT 32 code, VOID *v) { cout << "Count" << icount. Count() << endl; } VOID Instruction(INS ins, VOID *v) { int main(int argc, IARG_END); char * argv[]) { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR)docount, PIN_Init(argc, argv); } PIN_Add. Fini. Function(Fini, 0); int main(int argc, char * argv[]) { icount. Activate(); PIN_Init(argc, argv); INS_Add. Instrument. Function(Instruction, PIN_Start. Program(); 0); return 0; PIN_Add. Fini. Function(Fini, 0); } PIN_Start. Program(); return 0; } 26 Pin ASPLOS Tutorial 2008

Useful Inst. Lib abstractions • ICOUNT – # of instructions executed • FILTER –

Useful Inst. Lib abstractions • ICOUNT – # of instructions executed • FILTER – Instrument specific routines or libraries only • ALARM – Execution count timer for address, routines, etc. • FOLLOW_CHILD – Inject Pin into new process created by parent process • TIME_WARP – Preserves RDTSC behavior across executions • CONTROL – Limit instrumentation address ranges 27 Pin ASPLOS Tutorial 2008

Debugging Pintools 1. Invoke gdb with pin (don’t “run”) $ gdb pin (gdb) 2.

Debugging Pintools 1. Invoke gdb with pin (don’t “run”) $ gdb pin (gdb) 2. In another window, start your pintool with the “-pause_tool” flag $ pin –pause_tool 5 –t inscount 0. so -- /bin/ls Pausing to attach to pid 32017 3. Go back to gdb window: a) Attach to the process b) “cont” to continue execution; can set breakpoints as usual attach 32017 (gdb) break main (gdb) cont 28 Pin ASPLOS Tutorial 2008

Pin Internals

Pin Internals

Pin Source Code Organization Pin source organized into generic, architecturedependent, OS-dependent modules: Architecture #source

Pin Source Code Organization Pin source organized into generic, architecturedependent, OS-dependent modules: Architecture #source files #source lines Generic 87 (48%) 53595 (47%) x 86 (32 -bit+ 64 -bit) 34 (19%) 22794 (20%) Itanium 34 (19%) 20474 (18%) ARM 27 (14%) 17933 (15%) TOTAL 182 (100%) 114796 (100%) C ~50% code shared among architectures 30 Pin ASPLOS Tutorial 2008

Pin’s Software Architecture Address space Pintool Pin Instrumentation APIs Application Virtual Machine (VM) JIT

Pin’s Software Architecture Address space Pintool Pin Instrumentation APIs Application Virtual Machine (VM) JIT Compiler Cache Emulation Unit Operating System Hardware 31 Code Pin ASPLOS Tutorial 2008

Dynamic Instrumentation Original code Code cache 1’ 1 2 3 5 Exits point back

Dynamic Instrumentation Original code Code cache 1’ 1 2 3 5 Exits point back to Pin 2’ 4 7’ 6 7 Pin fetches trace starting block 1 and start instrumentation 32 Pin ASPLOS Tutorial 2008 Pin

Dynamic Instrumentation Original code Code cache 1’ 1 2 3 5 2’ 4 7’

Dynamic Instrumentation Original code Code cache 1’ 1 2 3 5 2’ 4 7’ 6 7 Pin transfers control into code cache (block 1) 33 Pin ASPLOS Tutorial 2008 Pin

Dynamic Instrumentation Original code Code cache trace linking 1 2 3 5 34 3’

Dynamic Instrumentation Original code Code cache trace linking 1 2 3 5 34 3’ 2’ 5’ 7’ 6’ 4 6 7 1’ Pin fetches and instrument a new trace Pin ASPLOS Tutorial 2008 Pin

Implementation Challenges • Linking – Straightforward for direct branches – Tricky for indirects, invalidations

Implementation Challenges • Linking – Straightforward for direct branches – Tricky for indirects, invalidations • Re-allocating registers • Maintaining transparency • Self-modifying code • Supporting MT applications… 35 Pin ASPLOS Tutorial 2008

Pin’s Multithreading Support Thread-safe accesses Pin, Pintool, and App – Pin: One thread in

Pin’s Multithreading Support Thread-safe accesses Pin, Pintool, and App – Pin: One thread in the VM at a time – Pintool: Locks, Thread. ID, event notification – App: Thread-local spill area Providing pthreads functions to instrumentation tools Application System’s libpthread set up signal handlers signal Redirect all other pthreads function calls to application’s libpthread Pintool 36 Pin’s mini-libpthread Pin ASPLOS Tutorial 2008 handler

Optimizing Pintools

Optimizing Pintools

Reducing Instrumentation Overhead Total Overhead = Pin Overhead + Pintool Overhead • Pin team’s

Reducing Instrumentation Overhead Total Overhead = Pin Overhead + Pintool Overhead • Pin team’s job is to minimize this • ~5% for SPECfp and ~20% for SPECint • Pintool writers can help minimize this! 38 Pin ASPLOS Tutorial 2008

Pin Overhead SPEC Integer 2006 39 Pin ASPLOS Tutorial 2008

Pin Overhead SPEC Integer 2006 39 Pin ASPLOS Tutorial 2008

Adding User Instrumentation 40 Pin ASPLOS Tutorial 2008

Adding User Instrumentation 40 Pin ASPLOS Tutorial 2008

Reducing the Pintool’s Overhead Instrumentation Routines Overhead + Frequency of calling an Analysis Routine

Reducing the Pintool’s Overhead Instrumentation Routines Overhead + Frequency of calling an Analysis Routine Work required for transiting to Analysis Routine 41 Pin ASPLOS Tutorial 2008 Analysis Routines Overhead x Work required in the Analysis Routine Work done inside Analysis Routine

Analysis Routines: Reduce Call Frequency Key: Instrument at the largest granularity whenever possible Trace

Analysis Routines: Reduce Call Frequency Key: Instrument at the largest granularity whenever possible Trace > Basic Block > Instruction 42 Pin ASPLOS Tutorial 2008

Slower Instruction Counting counter++; sub $0 xff, %edx counter++; cmp %esi, %edx counter++; jle

Slower Instruction Counting counter++; sub $0 xff, %edx counter++; cmp %esi, %edx counter++; jle <L 1> counter++; mov $0 x 1, %edi counter++; add $0 x 10, %eax 43 Pin ASPLOS Tutorial 2008

Faster Instruction Counting 44 Counting at BBL level Counting at Trace level counter +=

Faster Instruction Counting 44 Counting at BBL level Counting at Trace level counter += 3 sub $0 xff, %edx counter += 5 sub $0 xff, %edx cmp %esi, %edx jle <L 1> counter += 2 mov $0 x 1, %edi jle <L 1> mov $0 x 1, %edi add $0 x 10, %eax Pin ASPLOS Tutorial 2008 counter-=2 L 1

Reducing Work in Analysis Routines Key: Shift computation from analysis routines to instrumentation routines

Reducing Work in Analysis Routines Key: Shift computation from analysis routines to instrumentation routines whenever possible 45 Pin ASPLOS Tutorial 2008

Edge Counting: a Slower Version. . . void docount 2(ADDRINT src, ADDRINT dst, INT

Edge Counting: a Slower Version. . . void docount 2(ADDRINT src, ADDRINT dst, INT 32 taken) { COUNTER *pedg = Lookup(src, dst); pedg->count += taken; } void Instruction(INS ins, void *v) { if (INS_Is. Branch. Or. Call(ins)) { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR)docount 2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); } }. . . 46 Pin ASPLOS Tutorial 2008

Edge Counting: a Faster Version void docount(COUNTER* pedge, INT 32 taken) { pedg->count +=

Edge Counting: a Faster Version void docount(COUNTER* pedge, INT 32 taken) { pedg->count += taken; } void docount 2(ADDRINT src, ADDRINT dst, INT 32 taken) { COUNTER *pedg = Lookup(src, dst); pedg->count += taken; } void Instruction(INS ins, void *v) { if (INS_Is. Direct. Branch. Or. Call(ins)) { COUNTER *pedg = Lookup(INS_Address(ins), INS_Direct. Branch. Or. Call. Target. Address(ins)); INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) docount, IARG_ADDRINT, pedg, IARG_BRANCH_TAKEN, IARG_END); } else INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) docount 2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); } … 47 Pin ASPLOS Tutorial 2008

Reducing Work for Analysis Transitions Key: Help Pin’s optimizations apply to your analysis routines:

Reducing Work for Analysis Transitions Key: Help Pin’s optimizations apply to your analysis routines: – Inlining – Scheduling 48 Pin ASPLOS Tutorial 2008

Inlining Not-inlinable Inlinable int docount 1(int i) { int docount 0(int i) { if

Inlining Not-inlinable Inlinable int docount 1(int i) { int docount 0(int i) { if (i == 1000) x[i]++; return x[i]; } } Not-inlinable int docount 2(int i) { Not-inlinable void docount 3() { x[i]++; for(i=0; i<100; i++) printf(“%d”, i); return x[i]; x[i]++; } } 49 Pin ASPLOS Tutorial 2008

Conditional Inlining Inline a common scenario where the analysis routine has a single “if-then”

Conditional Inlining Inline a common scenario where the analysis routine has a single “if-then” • • The “If” part is always executed The “then” part is rarely executed Pintool writer breaks such an analysis routine into two: • • 50 INS_Insert. If. Call (ins, …, (AFUNPTR)doif, …) INS_Insert. Then. Call (ins, …, (AFUNPTR)dothen, …) Pin ASPLOS Tutorial 2008

IP-Sampling (a Slower Version) const INT 32 N = 10000; const INT 32 M

IP-Sampling (a Slower Version) const INT 32 N = 10000; const INT 32 M = 5000; INT 32 icount = N; VOID Ip. Sample(VOID* ip) { --icount; if (icount == 0) { fprintf(trace, “%pn”, ip); icount = N + rand()%M; //icount is between <N, N+M> } } VOID Instruction(INS ins, VOID *v) { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR)Ip. Sample, IARG_INST_PTR, IARG_END); } 51 Pin ASPLOS Tutorial 2008

IP-Sampling (a Faster Version) INT 32 Count. Down() { --icount; inlined return (icount==0); }

IP-Sampling (a Faster Version) INT 32 Count. Down() { --icount; inlined return (icount==0); } VOID Print. Ip(VOID *ip) { fprintf(trace, “%pn”, ip); not inlined icount = N + rand()%M; //icount is between <N, N+M> } VOID Instruction(INS ins, VOID *v) { // Count. Down() is always called before an inst is executed INS_Insert. If. Call(ins, IPOINT_BEFORE, (AFUNPTR)Count. Down, IARG_END); // Print. Ip() is called only if the last call to Count. Down() // returns a non-zero value INS_Insert. Then. Call(ins, IPOINT_BEFORE, (AFUNPTR)Print. Ip, IARG_INST_PTR, IARG_END); } 52 Pin ASPLOS Tutorial 2008

Instrumentation Scheduling If an instrumentation can be inserted anywhere in a basic block: •

Instrumentation Scheduling If an instrumentation can be inserted anywhere in a basic block: • Let Pin know via IPOINT_ANYWHERE • Pin will find the best point to insert the instrumentation to minimize register spilling 53 Pin ASPLOS Tutorial 2008

Manual. Examples/inscount 1. cpp #include <stdio. h> #include "pin. H“ UINT 64 icount =

Manual. Examples/inscount 1. cpp #include <stdio. h> #include "pin. H“ UINT 64 icount = 0; analysis routine void docount(INT 32 c) { icount += c; } void Trace(TRACE trace, void *v) { instrumentation routine for (BBL bbl = TRACE_Bbl. Head(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_Insert. Call(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_UINT 32, BBL_Num. Ins(bbl), IARG_END); } } void Fini(INT 32 code, void *v) { fprintf(stderr, "Count %lldn", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_Add. Instrument. Function(Trace, 0); PIN_Add. Fini. Function(Fini, 0); PIN_Start. Program(); return 0; } 54 Pin ASPLOS Tutorial 2008

Conclusions A dynamic instrumentation system for building your own program analysis tools Runs on

Conclusions A dynamic instrumentation system for building your own program analysis tools Runs on multiple platforms: • IA-32, Intel 64, Itanium, and XScale • Linux, Windows, Mac. OS Works on real-life applications Efficient instrumentation (especially with your help!) 55 Pin ASPLOS Tutorial 2008

Part Two: Using Pin in Your Research Kim Hazelwood Vijay Janapa Reddi

Part Two: Using Pin in Your Research Kim Hazelwood Vijay Janapa Reddi

Pin Applications Sample tools in the Pin distribution: • Cache simulators, branch predictors, address

Pin Applications Sample tools in the Pin distribution: • Cache simulators, branch predictors, address tracer, syscall tracer, edge profiler, stride profiler Some tools developed and used inside Intel: • Opcodemix (analyze code generated by compilers) • Pin. Points (find representative regions in programs to simulate) • A tool for detecting memory bugs Companies are writing their own Pintools Universities use Pin in teaching and research 57 Pin ASPLOS Tutorial 2008

Tools for Program Analysis Debugtrace – debugging/program understanding aid, can see general call traces,

Tools for Program Analysis Debugtrace – debugging/program understanding aid, can see general call traces, instruction traces, includes reads and writes of registers and memory Malloctrace – traces of execution of specific functions Insmix – statistics/characterization Statica – static analysis of binaries 58 Pin ASPLOS Tutorial 2008

Compiler Bug Detection Opcodemix uncovered a compiler bug for crafty Instructio Compiler n Type

Compiler Bug Detection Opcodemix uncovered a compiler bug for crafty Instructio Compiler n Type A Count B Count *total 712 M 618 M XORL 94 M 59 TESTQ RET PUSHQ POPQ JE LEAQ JNZ 94 M 94 M 94 M 37 M 94 M 0 M 0 M 0 M 37 M 131 M Pin ASPLOS Tutorial 2008 Delta -94 M 0 M 0 M 0 M -94 M 0 M 94 M

Thread Checker Basics Detect common parallel programming bugs: • Data races, deadlocks, thread stalls,

Thread Checker Basics Detect common parallel programming bugs: • Data races, deadlocks, thread stalls, threading API usage violations Instrumentation used: • Memory operations • Synchronization operations (via function replacement) • Call stack Pin-based prototype • Runs on Linux, x 86 and x 86_64 • A Pintool ~2500 C++ lines 60 Pin ASPLOS Tutorial 2008

Thread Checker Results Potential errors in SPECOMP 01 reported by Thread Checker (4 threads

Thread Checker Results Potential errors in SPECOMP 01 reported by Thread Checker (4 threads were used) 61 Pin ASPLOS Tutorial 2008

a documented data race in the art benchmark is detected 62 Pin ASPLOS Tutorial

a documented data race in the art benchmark is detected 62 Pin ASPLOS Tutorial 2008

Instrumentation-Driven Simulation Fast exploratory studies • Instrumentation ~= native execution • Simulation speeds at

Instrumentation-Driven Simulation Fast exploratory studies • Instrumentation ~= native execution • Simulation speeds at MIPS Characterize complex applications • E. g. Oracle, Java, parallel data-mining apps Simple to build instrumentation tools • Tools can feed simulation models in real time • Tools can gather instruction traces for later use 63 Pin ASPLOS Tutorial 2008

Performance Models Branch Predictor Models: • PC of conditional instructions • Direction Predictor: Taken/not-taken

Performance Models Branch Predictor Models: • PC of conditional instructions • Direction Predictor: Taken/not-taken information • Target Predictor: PC of target instruction if taken Cache Models: • Thread ID (if multi-threaded workload) • Memory address • Size of memory operation • Type of memory operation (Read/Write) Simple Timing Models: • Latency information 64 Pin ASPLOS Tutorial 2008

Branch Predictor Model API data Pin API() Instrumentation Tool BPSim Pin Tool Branch instr

Branch Predictor Model API data Pin API() Instrumentation Tool BPSim Pin Tool Branch instr info Instrumentation Routines Model Analysis Routines BPSim Pin Tool • • Instruments all branches Uses API to set up call backs to analysis routines Branch Predictor Model: • 65 Detailed branch predictor simulator Pin ASPLOS Tutorial 2008 BP

BP Implementation INSTRUMENT VOID Process. Branch(ADDRINT PC, ADDRINT target. PC, bool Br. Taken) {

BP Implementation INSTRUMENT VOID Process. Branch(ADDRINT PC, ADDRINT target. PC, bool Br. Taken) { BP_Info pred = my. BPU. Get. Prediction( PC ); if( pred. Taken != Br. Taken ) { // Direction Mispredicted } if( pred. Target != target. PC ) { // Target Mispredicted } my. BPU. Update( PC, Br. Taken, target. PC); } VOID Instruction(INS ins, VOID *v) { if( INS_Is. Direct. Branch. Or. Call(ins) || INS_Has. Fall. Through(ins) ) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Process. Branch, ADDRINT, INS_Address(ins), IARG_UINT 32, INS_Direct. Branch. Or. Call. Target. Address(ins), IARG_BRANCH_TAKEN, IARG_END); } MAIN ANALYSIS Branch. Predictor my. BPU; int main() { PIN_Init(); INS_Add. Instrumentation. Function(Instruction, 0); PIN_Start. Program(); } 66 Pin ASPLOS Tutorial 2008

Branch Predictor Performance - GCC Bimodal In Mc. Farling Predictor Branch prediction accuracies range

Branch Predictor Performance - GCC Bimodal In Mc. Farling Predictor Branch prediction accuracies range from 0 -100% Branches are hard to predict in some phases • Can simulate these regions alone by fast forwarding to them in real time 67 Pin ASPLOS Tutorial 2008

Performance Model Inputs Branch Predictor Models: • PC of conditional instructions • Direction Predictor:

Performance Model Inputs Branch Predictor Models: • PC of conditional instructions • Direction Predictor: Taken/not-taken information • Target Predictor: PC of target instruction if taken Cache Models: • Thread ID (if multi-threaded workload) • Memory address • Size of memory operation • Type of memory operation (Read/Write) Simple Timing Models: • Latency information 68 Pin ASPLOS Tutorial 2008

Cache Simulators API data Pin Instrumentation Tool Cache Pin Tool API() Mem Addr info

Cache Simulators API data Pin Instrumentation Tool Cache Pin Tool API() Mem Addr info Instrumentation Routines Cache Model Analysis Routines Cache Pin Tool • • Instruments all instructions that reference memory Use API to set up call backs to analysis routines Cache Model: • 69 Detailed cache simulator Pin ASPLOS Tutorial 2008

Cache Implementation MAIN INSTRUMENT ANALYSIS CACHE_t Cache. Hierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS]; 70 VOID Mem. Ref(int tid, ADDRINT

Cache Implementation MAIN INSTRUMENT ANALYSIS CACHE_t Cache. Hierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS]; 70 VOID Mem. Ref(int tid, ADDRINT addr. Start, int size, int type) { for(addr=addr. Start; addr<(addr. Start+size); addr+=LINE_SIZE) Lookup. Hierarchy( tid, FIRST_LEVEL_CACHE, addr, type); } VOID Lookup. Hierarchy(int tid, int level, ADDRINT addr, int access. Type){ result = cache. Hier[tid][cache. Level]->Lookup(addr, access. Type ); if( result == CACHE_MISS ) { if( level == LAST_LEVEL_CACHE ) return; Lookup. Hierarchy(tid, level+1, addr, access. Type); } } VOID Instruction(INS ins, VOID *v) { if( INS_Is. Memory. Read(ins) ) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Mem. Ref, IARG_THREAD_ID, IARG_MEMORYREAD_EA, IARG_MEMORYREAD_SIZE, IARG_UINT 32, ACCESS_TYPE_LOAD, IARG_END); if( INS_Is. Memory. Write(ins) ) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Mem. Ref, IARG_THREAD_ID, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_UINT 32, ACCESS_TYPE_STORE, IARG_END); } int main() { PIN_Init(); INS_Add. Instrumentation. Function(Instruction, 0); PIN_Start. Program(); Pin ASPLOS Tutorial 2008

Performance Models Branch Predictor Models: • PC of conditional instructions • Direction Predictor: Taken/not-taken

Performance Models Branch Predictor Models: • PC of conditional instructions • Direction Predictor: Taken/not-taken information • Target Predictor: PC of target instruction if taken Cache Models: • Thread ID (if multi-threaded workload) • Memory address • Size of memory operation • Type of memory operation (Read/Write) Simple Timing Models: • Latency information 71 Pin ASPLOS Tutorial 2008

Simple Timing Model Assume 1 -stage pipeline • Ti cycles for instruction execution Assume

Simple Timing Model Assume 1 -stage pipeline • Ti cycles for instruction execution Assume branch misprediction penalty • Tb cycles penalty for branch misprediction Assume cache access & miss penalty • Tl cycles for demand reference to cache level l • Tm cycles for demand reference to memory LLC Total cycles = a. Ti + b. Tb + l SA l. Tl + h. Tm =1 a = instruction count; b = # branch mispredicts ; Al = # accesses to cache level l ; h = # last level cache (LLC) misses 72 Pin ASPLOS Tutorial 2008

Performance - GCC IPC L 1 Miss Rate 2 -way 32 KB L 2

Performance - GCC IPC L 1 Miss Rate 2 -way 32 KB L 2 Miss Rate 4 -way 256 KB L 3 Miss Rate 8 -way 2 MB cumulative 10 mil phase Several phases of execution • Important to pick the correct phase of exeuction 73 Pin ASPLOS Tutorial 2008

Performance – AMMP IPC init repetitive L 1 Miss Rate 2 -way 32 KB

Performance – AMMP IPC init repetitive L 1 Miss Rate 2 -way 32 KB L 2 Miss Rate 4 -way 256 KB L 3 Miss Rate 8 -way 2 MB cumulative 10 mil phase One loop (3 billion instructions) is representative • High miss rate at beginning; exploits locality at end 74 Pin ASPLOS Tutorial 2008

Moving from 32 -bit to 64 -bit Applications How to identify the reasons for

Moving from 32 -bit to 64 -bit Applications How to identify the reasons for these performance results? ØProfiling with Pin! Ye 06, IISWC 2006 Benchmark Language 64 -bit vs. 32 -bit speedup perlbench C 3. 42% bzip 2 C 15. 77% gcc C -18. 09% mcf C -26. 35% gobmk C 4. 97% hmmer C 34. 34% sjeng C 14. 21% libquantum C 35. 38% h 264 ref C 35. 35% omnetpp C++ -7. 83% astar C++ 8. 46% xalancbmk C++ -13. 65% Average 75 Pin ASPLOS Tutorial 2008 7. 16%

Main Observations In 64 -bit mode: • Code size increases (10%) • Dynamic instruction

Main Observations In 64 -bit mode: • Code size increases (10%) • Dynamic instruction count decreases • Code density increases • L 1 icache request rate increases • L 1 dcache request rate decreases significantly • Data cache miss rate increases 76 Pin ASPLOS Tutorial 2008

Instrumentation-Based Simulation • Simple compared to detailed models • Can easily run complex applications

Instrumentation-Based Simulation • Simple compared to detailed models • Can easily run complex applications • Provides insight on workload behavior over their entire runs in a reasonable amount of time Illustrated the use of Pin for: • Program Analysis – Bug detection, thread analysis • Computer architecture – Branch predictors, cache simulators, timing models, architecture width • Architecture changes – Moving from 32 -bit to 64 -bit 77 Pin ASPLOS Tutorial 2008

Pin-based Projects in Academia Kim Hazelwood Vijay Janapa Reddi

Pin-based Projects in Academia Kim Hazelwood Vijay Janapa Reddi

Detecting Zero-Day Attacks Problem • Freshly authored malicious code can go undetected by even

Detecting Zero-Day Attacks Problem • Freshly authored malicious code can go undetected by even the most up-to-date virus scanners Approach • Using Pin to develop information flow tracking systems targeting zero-day attacks Who • • 79 David Kaeli @ Northeastern University Basis for a new start-up company Pin ASPLOS Tutorial 2008

Dytan: A Taint Analysis Framework • Problem Dynamic taint analysis is defined an adhoc-manner,

Dytan: A Taint Analysis Framework • Problem Dynamic taint analysis is defined an adhoc-manner, which limits extendibility, experimentation & adaptability • Approach Define and develop a general framework that is customizable and performs data- and control-flow tainting • Who J. Clause, W. Li, A. Orso @ Georgia Institute of Technology Int'l. Symposium on Software Testing and Analysis ‘ 07 80 Pin ASPLOS Tutorial 2008

Security Characterization Problem • • SPAM costs us money and time Anti-virus software is

Security Characterization Problem • • SPAM costs us money and time Anti-virus software is a resource hog Approach • Using Pin to characterize SPAM and Anti-virus workloads Who • • 81 David Kaeli @ Northeastern University Resulted in joint projects with VMWare and Network Engines Pin ASPLOS Tutorial 2008

Workload Characterization • Problem Extracting important trends from programs with large data sets is

Workload Characterization • Problem Extracting important trends from programs with large data sets is challenging • Approach Collect hardware-independent characteristics across program execution and apply them to statistical data analysis and machine learning techniques to find trends • Who K. Hoste and L. Eeckhout @ Ghent University 82 Pin ASPLOS Tutorial 2008

Loop-Centric Profiling • Problem Identifying parallelism is difficult • Approach Provide a hierarchical view

Loop-Centric Profiling • Problem Identifying parallelism is difficult • Approach Provide a hierarchical view of how much time is spent in loops, and the loops nested within them using (1) instrumentation and (2) light-weight sampling to automatically identify opportunities of parallelism • Who T. Moseley, D. Connors, D. Grunwald, R. Peri @ University of Colorado, Boulder and Intel Corporation Int'l. Conference on Computing Frontiers (CF) ‘ 07 83 Pin ASPLOS Tutorial 2008

Supporting Field Failure Debugging • Problem In-house software quality is challenging, which results in

Supporting Field Failure Debugging • Problem In-house software quality is challenging, which results in field failures that are difficult to replicate and resolve • Approach Improve in-house debugging of field failures by (1) Recording & Replaying executions (2) Generating minimized executions for faster debugging • Who J. Clause and A. Orso @ Georgia Institute of Technology ACM SIGSOFT Int'l. Conference on Software Engineering ‘ 07 84 Pin ASPLOS Tutorial 2008

Pin-Based Fault Tolerance Analysis Problem • • Simulate the occurrence of transient faults and

Pin-Based Fault Tolerance Analysis Problem • • Simulate the occurrence of transient faults and analyze their impact on applications Construction of run-time system capable of providing software-centric fault tolerance service Approach • • • Easy to model errors and the generation of faults and their impact Relatively fast (5 -10 minutes per fault injection) Provides full program analysis Who • 85 Dan Connors, Alex Shye, Joe Blomstedt, Harshad Sane, Alpesh Vaghasia, Tipp Moseley @ University of Colorado Pin ASPLOS Tutorial 2008

Exploratory Extensions Kim Hazelwood Vijay Janapa Reddi

Exploratory Extensions Kim Hazelwood Vijay Janapa Reddi

Common Use of Pin Instruction Information Trace-driven Framework Pin 87 Pin Tool Pin ASPLOS

Common Use of Pin Instruction Information Trace-driven Framework Pin 87 Pin Tool Pin ASPLOS Tutorial 2008

Driving Execution using Pin Instruction Information Execution-driven Framework Program control Pin 88 Pin Tool

Driving Execution using Pin Instruction Information Execution-driven Framework Program control Pin 88 Pin Tool Pin ASPLOS Tutorial 2008

Session Objectives • Building and Running Pin Tools • Understanding program execution using Pin

Session Objectives • Building and Running Pin Tools • Understanding program execution using Pin Program Instruction Stream Memory Machine State • Putting it all together: Transactional Memory 89 Pin ASPLOS Tutorial 2008

Structure of a Pin Tool FILE * trace; Pin Tool traces Virtual Addresses VOID

Structure of a Pin Tool FILE * trace; Pin Tool traces Virtual Addresses VOID Record. Mem. Write(VOID * ip, VOID * va, UINT 32 size) { fprintf(trace, "%p: W %p %dn", ip, va, size); } Analysis VOID Instruction(INS ins, VOID *v) { if (INS_Is. Memory. Write(ins)) { INS_Insert. Call(ins, IPOINT_BEFORE, AFUNPTR(Record. Mem. Write), IARG_INST_PTR, IARG_MEMORYWRITE_VA, IARG_MEMORYWRITE_SIZE, IARG_END); } } Instrumentation int main(int argc, char *argv[]) { PIN_Init(argc, argv); trace = fopen("atrace. out", "w"); INS_Add. Instrument. Function(Instruction, 0); PIN_Start. Program(); return 0; } 90 Pin ASPLOS Tutorial 2008 Callback Registration

Machine Architectural State Interposition • Observe instruction operands and their values – IARG_BRANCH_TAKEN, IARG_REG_VALUE,

Machine Architectural State Interposition • Observe instruction operands and their values – IARG_BRANCH_TAKEN, IARG_REG_VALUE, IARG_CONTEXT, … • Modify register values • Save and restore state • Instruction emulation 91 Pin ASPLOS Tutorial 2008

Machine Modify Architectural State • Alter register values via instrumentation – IARG_REFERENCE <register> –

Machine Modify Architectural State • Alter register values via instrumentation – IARG_REFERENCE <register> – PIN_REGISTER * rdtsc /* ======= Instrumentation routine ======= */ if (INS_Is. RDTSC(ins)) { INS_Insert. Call(ins, IPOINT_AFTER, (AFUNPTR) Deterministic. RDTSC, IARG_REFERENCE, REG_EDX, IARG_REFERENCE, REG_EAX, IARG_END); } /* ======== Analysis routine ======== */ VOID Deterministic. RDTSC(ADDRINT *p. EDX, ADDRINT *p. EAX) { static UINT 64 _edx_eax = 0; _edx_eax += 1; RDTSC-dependent code execution 92 *p. EDX = (_edx_eax & 0 xffff 0000 ULL) >> 32; *p. EAX = _edx_eax & 0 x 0000 ffff. ULL; } Pin ASPLOS Tutorial 2008

Machine Save and Resume Execution • Capture snapshots of the machine state to resume

Machine Save and Resume Execution • Capture snapshots of the machine state to resume at a later point – IARG_CHECKPOINT – PIN_Save. Checkpoint(CHECKPOINT *, CHECKPOINT *) – PIN_Resume(CHECKPOINT *) Original Stream Pin Stream PIN_Resume PIN_Save. Checkpoint 93 Pin ASPLOS Tutorial 2008

Machine Save and Resume Execution (2) • IARG_CHECKPOINT – Pin generates a snapshot (includes

Machine Save and Resume Execution (2) • IARG_CHECKPOINT – Pin generates a snapshot (includes instrumented state) • PIN_Save. Checkpoint (CHECKPOINT *src, CHECKPOINT *dst) – Extract and copy state from handle(src) to local buffer(dst) Save() /* ===== Instrumentation routine ===== */ INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Save, IARG_CHECKPOINT, IARG_END); /* ======= Analysis routine ======= */ CHECKPOINT ckpt; Pin Stream 94 VOID Save(CHECKPOINT* _ckpt) { PIN_Save. Checkpoint(_ckpt, &ckpt); } Pin ASPLOS Tutorial 2008

Machine Save and Resume Execution (3) • PIN_Resume(CHECKPOINT *) – Restore processor state to

Machine Save and Resume Execution (3) • PIN_Resume(CHECKPOINT *) – Restore processor state to saved checkpoint – Continue execution Save() /* ====== Instrumentation routine ====== */ INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Back, IARG_END); Back() /* ======= Analysis routine ======= */ CHECKPOINT ckpt; Pin Stream VOID Back() { PIN_Resume(&ckpt); assert(false); /* PIN_Resume does not return! */ } 95 Pin ASPLOS Tutorial 2008

Machine Instruction Emulation • Emulate the semantics of (new) instructions (1) Locate emu instruction

Machine Instruction Emulation • Emulate the semantics of (new) instructions (1) Locate emu instruction (3) Substitute emu function (2) Marshall semantics (4) Delete emu instruction Emu() … INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Emu, IARG_LIST, arglist, /* Pass enough information to IARG_END); emulate the ins semantics */ INS_Delete(ins); … 96 /* Kill the instruction */ Pin ASPLOS Tutorial 2008

Machine Emulating a Load Instruction #include "pin. H" #include "pin_isa. H“ ADDRINT Do. Load(REG

Machine Emulating a Load Instruction #include "pin. H" #include "pin_isa. H“ ADDRINT Do. Load(REG reg, ADDRINT * addr) { return *addr; } VOID Emulate. Load(INS ins, VOID* v) { if (INS_Opcode(ins) == XEDICLASS_MOV && INS_Is. Memory. Read(ins) && INS_Operand. Is. Reg(ins, 0) && INS_Operand. Is. Memory(ins, 1)) { INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Do. Load, IARG_UINT 32, REG(INS_Operand. Reg(ins, 0)), IARG_MEMORYREAD_EA, IARG_RETURN_REGS, INS_Operand. Reg(ins, 0), IARG_END); INS_Delete(ins); /* Emulate load type */ op 0 <- *op 1 } } void main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_Add. Instrument. Function(Emulate. Load, 0); PIN_Start. Program(); } 97 Pin ASPLOS Tutorial 2008

Memory Behavior • Memory access tracing – IARG_MEMORYREAD_EA, IARG_MEMORYWRITE_EA, … • Modify program memory

Memory Behavior • Memory access tracing – IARG_MEMORYREAD_EA, IARG_MEMORYWRITE_EA, … • Modify program memory – Pin Tool resides in the process’ address space Address Space Application API Compiler Pin Code Cache Pin Tool Operating System Hardware 98 Pin ASPLOS Tutorial 2008 ⇒ Change memory directly ( *addr = 0 x 123 )

Controlling Program Execution Pin (JIT) Pin Tool Pin (Probes) Address Space Pin Tool Application

Controlling Program Execution Pin (JIT) Pin Tool Pin (Probes) Address Space Pin Tool Application Compiler API Code Cache Application API Operating System Hardware Compiler Operating System Hardware Only translated code cached in the Code Cache is executed Pros : Complete coverage Cons: Slow 99 Address Space Code Cache Program Original code, and translated code are executed intermixed with one another Pros : Fast Cons: Limited coverage Pin ASPLOS Tutorial 2008

Program Executing @ Arbitrary Locations • JIT-mode (execute only translated code) – IARG_CONTEXT –

Program Executing @ Arbitrary Locations • JIT-mode (execute only translated code) – IARG_CONTEXT – PIN_Execute. At (CONTEXT *) PIN_Execute. At Pin Stream 100 Context Pin ASPLOS Tutorial 2008

Program Executing @ Arbitrary Locations (2) • IARG_CONTEXT – Pin generates program’s perception of

Program Executing @ Arbitrary Locations (2) • IARG_CONTEXT – Pin generates program’s perception of machine state • Pin_Execute. At (CONTEXT *) – Continue executing at context state /* ===== Instrumentation routine ===== */ if (INS_Address(ins) == 0 x 40000000 /* Foo: */) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Jmp 2 Bar, IARG_CONTEXT, IARG_END); Foo: Bar: Original Stream /* ======= Analysis routine ======= */ VOID Jmp 2 Bar(CONTEXT *ctxt) { PIN_Set. Context. Reg(ctxt, REG_INST_PTR, Bar); PIN_Execute. At(ctxt); assert(false); /* PIN_Execute. At does not return! */ } 101 Pin ASPLOS Tutorial 2008

Program Changing Program Code • PIN_Replace. Probed (Probe-mode) (RTN, AFUNPTR) – Redirect control flow

Program Changing Program Code • PIN_Replace. Probed (Probe-mode) (RTN, AFUNPTR) – Redirect control flow to new functions in the Pin Tool • PIN_Replace. Signature. Probed – (1) Redirect control flow (3) Use Pin arguments (IARG’s) (RTN, AFUNPTR, …) (2) Rewrite function prototypes foo() Original Stream 102 foo’ (Original) (Replacement) Pin ASPLOS Tutorial 2008

Program Replacing malloc() in Application typedef VOID * (*FUNCPTR_MALLOC)(size_t); VOID * My. Malloc(FUNCPTR_MALLOC org.

Program Replacing malloc() in Application typedef VOID * (*FUNCPTR_MALLOC)(size_t); VOID * My. Malloc(FUNCPTR_MALLOC org. Malloc, UINT 32 size, ADDRINT return. Ip) { FUNCPTR_MALLOC pool. Malloc = Lookup. Malloc. Pool(return. Ip, size); return (pool. Malloc) ? pool. Malloc(size) : org. Malloc(size); } VOID Image. Load(IMG img, VOID *v) { RTN malloc. RTN = RTN_Find. By. Name(img, "malloc"); if (RTN_Valid(rtn)) { PROTO prototype = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_CDECL, "malloc", PIN_PARG(int), PIN_PARG_END()); RTN_Replace. Signature. Probed(malloc. RTN, (AFUNPTR) My. Malloc, IARG_PROTOTYPE, prototype, /* Function prototype */ */ IARG_ORIG_FUNCPTR, /* Handle to application’s malloc IARG_FUNCARG_ENTRYPOINT_VALUE, 0, /* First argument to malloc IARG_RETURN_IP, /* IP of caller IARG_END); 103 } PROTO_Free( proto_malloc ); Pin ASPLOS Tutorial 2008

Program Source-level Probing • Instrument only specific regions of the source #include <stdio. h>

Program Source-level Probing • Instrument only specific regions of the source #include <stdio. h> #include "pinapp. h" int a[10]; int main() { void * th = PIN_New. Thread(); Pin printf("Thread handle %pn", th); PIN_Execute. Instrumented(th); for (int i = 0; i < 10; i++) { a[i] = i; } PIN_Execute. Uninstrumented(); return 0; } 104 Pin ASPLOS Tutorial 2008 Pin Tool

Putting It All Together: TMM Memory Model Begin Transaction Access Memory No Log Yes

Putting It All Together: TMM Memory Model Begin Transaction Access Memory No Log Yes Abort Transactional Conflict? – Log memory values modified by transaction – Verify conflicts across parallel transactions Finish Transaction 105 – Checkpoint architectural and memory state Pin ASPLOS Tutorial 2008 – Commit or Abort active transaction

Transactional Memory Model (1) Begin Transaction No Log Yes Abort /* === Instrumentation routine

Transactional Memory Model (1) Begin Transaction No Log Yes Abort /* === Instrumentation routine === */ Access Memory Conflict? /* ====== Analysis routine ====== */ CHECKPOINT chkpt[NTHREADS]; Finish Transaction 106 if (RTN_Address(rtn) == XBEGIN) { RTN_Insert. Call(rtn, IPOINT_BEFORE, AFUNPTR(Begin. Transaction), IARG_THREAD_ID, IARG_CHEKCPOINT, IARG_END); } void Begin. Transaction(int tid, CHECKPOINT *_chkpt) { PIN_Save. Checkpoint(_chkpt, chkpt[tid]; } Pin ASPLOS Tutorial 2008

Transactional Memory Model (2) Begin Transaction No Log Yes Abort /* ===== Instrumentation routine

Transactional Memory Model (2) Begin Transaction No Log Yes Abort /* ===== Instrumentation routine ===== */ Access Memory void Instruction(INS ins, void *v) { if (INS_Is. Memory. Write(ins)) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Log. And. Check, IARG_BOOL, true, IARG_THREAD_ID, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_END); if (INS_Is. Memory. Read(ins) INS_Insert. Call(ins, IPOINT_BEFORE, (AFUNPTR) Log. And. Check, IARG BOOL, false, IARG_THREAD_ID, IARG_MEMORYREAD_EA, IARG_MEMORYREAD_SIZE, IARG_END); Conflict? Finish Transaction } 107 Pin ASPLOS Tutorial 2008

Begin Transaction Access Memory Conflict? Finish Transaction 108 /* ==== Analysis routine ==== */

Begin Transaction Access Memory Conflict? Finish Transaction 108 /* ==== Analysis routine ==== */ No Log Yes Abort Transactional Memory Model (3) void Log. And. Check(BOOL iswrite, ADDRINT tid, ADDRINT addr, ADDRINT len) { if ( /* in transaction */ ) { if ( /* is conflict */ ) { /* restore mem with log[tid] */ PIN_Resume(&chkpt[th]); } else { /* record access in log[tid] */ } } } Pin ASPLOS Tutorial 2008

Transactional Memory Model (4) Begin Transaction No Log Yes Abort /* === Instrumentation routine

Transactional Memory Model (4) Begin Transaction No Log Yes Abort /* === Instrumentation routine === */ Access Memory Conflict? Finish Transaction 109 if (RTN_Address(rtn) == XEND) { RTN_Insert. Call(rtn, IPOINT_BEFORE, AFUNPTR(Commit. Transaction), IARG_THREAD_ID, IARG_END); } /* ====== Analysis routine ====== */ void Commit. Transaction(ADDRINT th) { /* * free thread’s checkpoint * and memory access log */ } Pin ASPLOS Tutorial 2008

Demo of Transactional Memory Multi-threaded Application Transactional Memory Pin Tool T 1 XBEGIN(); for

Demo of Transactional Memory Multi-threaded Application Transactional Memory Pin Tool T 1 XBEGIN(); for (uint 32_t i = 0; i < MAX; i++) { myarray[i] = 1; } XEND(); T 2 XBEGIN(); for (int 32_t i = MAX-1; i >= 0; i++) { myarray[i] = 2; } XEND(); 110 Yes Abort T 2 Pin ASPLOS Tutorial 2008 Access Memory Conflict? Finish Transaction No Log Begin Transaction T 1

Pin (user-level) App Pin. OS (system-level) App … … App Operating System Pin Hardware

Pin (user-level) App Pin. OS (system-level) App … … App Operating System Pin Hardware Pin Pin the OS! Pin. OS: A Programmable Framework for Whole-System Dynamic Instrumentation. Prashanth P. Bungale, C. K. Luk. Proceedings of Virtual Execution Environments (VEE 2007) 111 Pin ASPLOS Tutorial 2008

Trace Physical and Virtual Addresses FILE * trace; VOID Record. Mem. Write(VOID * ip,

Trace Physical and Virtual Addresses FILE * trace; VOID Record. Mem. Write(VOID * ip, VOID * va, VOID * pa, UINT 32 size) { Host_fprintf(trace, "%p: W %p %p %dn", ip, va, pa, size); } VOID Instruction(INS ins, VOID *v) { if (INS_Is. Memory. Write(ins)) { INS_Insert. Call(ins, IPOINT_BEFORE, AFUNPTR(Record. Mem. Write), IARG_INST_PTR, IARG_MEMORYWRITE_VA, IARG_MEMORYWRITE_PA, IARG_MEMORYWRITE_SIZE, IARG_END); } } int main(int argc, char *argv[]) { PIN_Init(argc, argv); trace = Host_fopen("atrace. out", "w"); INS_Add. Instrument. Function(Instruction, 0); PIN_Start. Program(); return 0; } 112 Pin ASPLOS Tutorial 2008 Pin. OS requires minimal API changes

Concluding Remarks • Dynamic instrumentation framework (Free!) – Transparent across platforms and environments •

Concluding Remarks • Dynamic instrumentation framework (Free!) – Transparent across platforms and environments • Platforms: IA 32, Intel 64, Itanium, and Xscale • Operating Systems: Linux, Windows, Mac. OS • Sample tools (use as templates) – Cache simulators, branch predictors, memory checkers, instruction and memory tracing, profiling, sampling … • Write your own tools! http: //rogue. colorado. edu/pin 113 Pin ASPLOS Tutorial 2008