Using Sampled and Incomplete Profiles David Kaeli Department
- Slides: 23
Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA kaeli@ece. neu. edu
Overview • Trace-based simulation is expensive (caches are getting larger, CPUs and networks are getting faster) • Approximate results are many times sufficient • How can we utilize a reduced or sampled profile and still obtain accurate modeling/simulation results?
How does sampling affect our Metrics for Evaluating Trace Collection Methodologies • Speed – sampled profiles reduce speed requirements • Memory – sampled profiles may take less space • Accuracy – sampled profiles are less accurate • Intrusiveness – sampling is less intrusive • Completeness – no change • Granularity – may affect our ability to capture fast events (nyquist) • Flexibility – no change • Portability – clock speed may affect sampling accuracy • Capacity – sampling should reduce this • Cost – potentially less cost and less time
Application Areas: • Memory system performance – cache simulation, working set models, temporal and spatial locality • CPU pipeline modeling – instruction frequencies, instruction sequences (2 -at-atime, 3 -at-a time, pipeline snapshots) • Network simulation – input traffic distribution, queue lengths, throughput, burstiness
Memory Systems • Temporal locality – addresses will be referenced close in time • Spatial locality – addresses close by will be referenced next • Working set models – Belady (1966) – Virtual memory page replacement algorithms (Optimal replacement defined) – Denning (1980) – The pattern of page access over the execution of the program – Thiebaut and Stone (1986) – The number of misses incurred due to task switches can be modeled as a binomial distribution
Memory Systems • Cold start – How do we model behavior when a program starts execution for the first time? • Warm start – How do we model behavior when a program resumes execution? • How does memory organization (e. g. , set associativity)and typical program behavior affect our ability to utilize sampled profiles effectively? • Do we sample in time or can we also sample in space (e. g. , a set)?
Memory Systems • If we only capture a subset of the important addresses, can we reproduce the full trace? – Abstract Execution (Larus 1990) – basic block traces – Trace Reduction (Smith 1977, Puzak 1985) – generate a reduced trace that contains the exact same number of misses and writebacks as the original trace (a type of filtering is performed, similar to Agarwal’s Block Filter) – Trace compaction (Samples 1989) – perform a diff on sequential addresses and only capture important diffs
Can we do something simple? Laha 1988 sample n ignore sampling interval Fu 1994 sample n+1 ignore sample size sampling ratio = sample size/sample interval Two types of errors 1. sampling errors – is the ratio optimal? 2. accurately predicting effects of ignored portions But this only suggests when to sample, not what to sample…
Sampling Rate: What is the Nyquist Frequency for an Program? • We must sample a program at a rate of twice the frequency of the event of interest • Half the sampling frequency is termed the Nyquist frequency • Sampling at lower rates than twice the Nyquist frequency can cause aliasing and distortion • The biggest problem is that not all events of interest in a program exhibit a nice periodic pattern
Example: gprof( ) • Produces 3 things – A listing of the total execution times and call counts for each of the functions in the program, sorted by decreasing time – The functions sorted according to the time they represent, including the time of their call graph descendents – Total execution in a cycle and the members in that cycle (a cycle is a back edge) • gprof samples a program’s execution – – Obtains exact call statistics Does not obtain exact time measurements Accuracy is obtained through statistical sampling Sampling reduces the associated overhead
When to sample: Cold start vs. Warm start
Sampling dimensions • We can sample in both time and space – Time (periodic sampling) – Filtered sampling (e. g. , using addresses ranges) • Time – Periodic – Random – #misses, #instructions, #loads/stores • Space – Address ranges – may not be representative of all ranges – Cache sets – may limit the utility of the sample – Statically tagged events – focuses in on particular instructions and data of interest
How do we account for unknown references? • Assume that this behavior does not affect past/future behavior • Assume that some percentage of past behavior is overwritten by unknown reference behavior – Decay model – Footprints in the cache – MRU model • Assume that all of the past behavior is overwritten by the unknown reference behavior
How do we model the effects of multiprogramming? ? • Flush all tables (caches, branch predictors, TLBs, load buffers, etc. ) • Estimate interference using a model – Invalidate some % of all entries based on the relative time since last execution – Utilize working set models and utilize these to estimate the effect of the interference • Allow aliasing to occur where appropriate (branch predictors, but not caches)
Sampled Instruction Execution Profiles • Instruction frequencies – For SPEC 92 int programs on a IA 32 CPU • 43% ALU, 22% loads, 12% stores, 23% control flow (H&P AQA) • Instruction sequences – Top pairs – Top triples – Continue up until average BB size • Branches in the pipeline – Sampled versus modeled
Analytical Models of Workload Behavior Squillante and Kaeli, 1997 • We an capture distributions of the distance between events • We can then compute the probability of different events occurring in a pipeline of length n (think of n as a window of execution) • We can also compute the conditional probabilities of multiple events occurring in the pipeline of length n • We can then assign weights to each of these multiple events to compute throughput (IPC) of a pipeline • Our model uses random marked point process (time between events) and produces very accurate estimates of pipeline throughput
Analytical Models of Workload Behavior Squillante and Kaeli, 1997 • Some constraints are assumed in the sake of simplicity: – Inter-branch times are independent – Inter-branch times are identically distributed – Delay due to taken branches is constant
Analytical Models of Workload Behavior Squillante and Kaeli, 1997
Analytical Models of Workload Behavior Squillante and Kaeli, 1997 • A analytical formula is used to compute approximate CPI and speedup measures for an n-stage pipelined processor • Traces are captured from benchmark execution • The distribution of the number of instructions between successive taken branches is computed using a “window of execution” filter
Analytical Models of Workload Behavior Squillante and Kaeli, 1997
Analytical Models of Workload Behavior Squillante and Kaeli, 1997 • For a pipeline length of 8, the obtained results are quite precise for three out of the five benchmarks • For bubblesort and prime, the IID assumption is violated, thus introducing some inaccuracies in our model • Future work looks at handling inaccuracies that provide multi-level conditional probabilities into our model
Capturing n-length Instruction Sequences • Sampling over n sequential instructions • Capturing the most frequently executed sequences • Utilizing these sequences to drive pipeline design • Capturing longer profiles may allow us to predict design hardware trace caches • Gonzalez, Tubella and Molina describe a mechanism for both profiled instructions and operand values
- Voice translation rules
- Explain how a sound can be ‘sampled’
- A one hectare pond is sampled in early september
- How aliasing does corrupts the sampled image
- Sampled lru
- Efficient simplification of point-sampled surfaces
- Revised profiles of the gifted and talented
- H.264 profiles and levels
- Welding discontinuities and defects
- Hazard profile examples
- To kill a mockingbird jean louise
- Walker royce software project management
- Alma publishing profiles
- Onf profiles
- Netsh wlan show profiles
- Specific flashing profiles
- Npv profiles
- Npv profiles
- Modified irr
- Npv profiles
- Chapter 9 capital budgeting answer key
- Npv profiles
- Steps of capital budgeting
- Asee profiles