Simple Scalar v 3 0 Tutorial U of
- Slides: 39
Simple. Scalar v 3. 0 Tutorial U. of Wisconsin, CS 752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Simulator Basics • What is an architectural simulator? – Tool that reproduces the behavior of a computing device • Why use a simulator? – Flexible • Rapid design space exploration • Tailor abstraction level to need – Cheap • Why not use a simulator? – Slow – Correctness?
Functional vs. Performance • Functional simulators implement the architecture - Perform the actual execution - Implement what programmers see • Performance (or timing) simulators implement the microarchitecture. - Model system resources/internals - Measure time - Implement what programmers do not see
Trace vs. Execution Driven (2) • Trace + Easy to Implement - Requires large disk files to store instruction stream - Limited state stored - Speculation? Multiprocessor? • Execution - Hard to Implement + Allows access to full state information at any point during program execution +/- Execution requires inclusion of instruction set emulator and an I/O emulation module
Simplescalar • Developed by Todd Austin with Doug Burger at UW, ’ 94 -’ 96 • Execution – driven • Collection of simulators that emulate the microprocessor at different detail levels (functional, functional + cache / bpred, out-of-order cycle-timer, etc. ) • Tools: – C/Fortran compiler, assembler, linker (for PISA) – DLite: target-machine-level debugger – pipeline trace viewer
Advantages of Simple. Scalar • Fast (given priority over clarity) – 4 MIPS functional, 300 KIPS OOO on 1. 5 GHz host • • • Relatively(!) simple - short learning curve Modular design Well documented and commented Popular - support community, extensions Limitations will be summarized at the end
Sim-Fast • Bare functional simulator • Does not account for the behavior of any part of the microarchitecture • Optimized for speed
Sim-Safe • Similar to sim-fast (slower) • Implements some memory op safeguards – Memory alignment – Memory access permission • Good for debugging sim-fast crashes
Sim-EIO • External trace/checkpoint generator • Functional simulator like sim-fast and simsafe • Implements checks like sim-safe
Sim-Profile • Profiles by symbol and address • Keeps track of and reports (many options) – – Dynamic instruction counts Instruction class counts Usage of address modes Profiles of the text & data segment
Sim-Cache/Sim-Bpred • Functional core drives the detailed model of the cache/branch predictor • Similar to trace-driven cache/bpred simulation • Fast results for miss/misprediction rates • No timing simulation/performance impact
Cache Implementation • block size, # of sets, associativity, all customizable • Replacement policies: random, FIFO, LRU • 2 -level cache hierarchy supported (easily extended) • Unified/separate I/D L 1 and L 2 caches
Implemented Predictors • Specifying branch predictor type -bpred <type> • Implemented predictors – nottaken – perfect – bimod – 2 lev always predicts not taken always right BTB with 2 -bit counters 2 -level adaptive branch predictor • Specify level 1 size, level 2 size, history size(# bits), XOR PC and history? • GAg, GAp, PAg, PAp, gshare – combining (meta) predictor
Sim-Outorder • Detailed performance simulator • Out-of-order execution core • Register renaming, reorder buffer, speculative execution • 2 -level cache hierarchy • Branch prediction
Making a Fast Timing Simulator • Hardware advantage: parallelism • Software emulator advantage: free space for auxillary structures • Functional and detailed performance parts can be decoupled
Simulator RUU vs. Logical RUU • No consumer -> producer links (tags) • Rather, producer->consumer links for efficient result broadcast at writeback • Values not tracked (except address) • “Completed” bit (ROB and IQ combined) • Same struct used for LSQ
Other Simulator Structures • ready_queue – Ready to issue instructions – Used at issue stage – Built at writeback – Limited issue bandwidth – Policy: mul/div, load/store, branch first, then oldest first
Other Simulator Structures • fu_pool – functional units – – – Issue latency vs. operational latency Both constant, hard-coded in sim-outorder Read port latency more detailed (cache simulation) Issue latency – busy counter at FU Operational latency – event queue • event_queue – ordered by cycle – schedules writeback events
Other Simulator Structures • create_vector – Register renaming – Maps logical register to RUU entry (architectural register up to date if entry is null) – Updated at dispatch and writeback – Similar to Qi in Tomasulo – Backed up during dispatch if sim “divines” misspeculation
Stage Implementation (greater detail in SS hack guide and v 2. 0 tutorial) • Fetch (at ruu_fetch()) – Fetch ins-s up to cache line boundary – Block on miss – Put in Fetch Queue (FQ) – Probe branch predictor for next cache line to fetch from
Stage Implementation • Decode (at ruu_decode()) – – – Fetch from FQ Decode Rename registers Put into RUU and LSQ Update RUU dependency lists, renaming table Sim (functional core): • Execute instruction, update machine state • Detect branch misprediction, backup state (checkpoint)
Stage Implementation • Issue (at ruu_issue() and lsq_refresh()) – Ready queue -> Event queue – Order based on policy (see ready_queue slide) – Check and reserve FU’s – Mem. Ops check memory dependences • No load speculation (“maybe” dependence respected)
Stage Implementation • Writeback (at ruu_writeback()) – Get finished instructions from ready queue – Wake up (put in ready queue) instructions with ready operands (use dependence list) – Performance core detects misprediction here and rolls back the state
Stage Implementation • Commit (at ruu_commit()) – Service D-TLB misses – Update register file (logically) and rename table – Retire stores to D-cache – Reclaim RUU/LSQ entries of retirees
Limitations • I/O and other system calls – Only some limited functional simulation • Lacks support for arbitrary speculation – Only branches can cause rollback – No speculative loads – Harder problem: Decoupling of functional and timing cores (for performance) complicates data speculation extensions
Limitations of Memory System • No causality in memory system – All events are calculated at time of first access • Simple TLB and virtual memory model – No address translation – TLB miss is a (small) fixed latency parallel with cache access • Bandwidth, non-blocking – Modeled as n FU’s (read/write ports with fixed issue delay) • Accurate if memory system is lightly utilized – SMT extensions? • Overhaul required for multiprocessor simulation
Extensions www. simplescalar. com • trace cache • value prediction • SMT • multiprocessor • more target ISA / host OS support
Miscellaneous • Enter “sim-outorder” with no options to get options list and usage examples • Options database (options. h) – Interface to register an option and check if entered at initialization • Stats database (stats. h) – Register new stats – Define secondary stats and track distributions
Miscellaneous
- Outorder
- Simplescalar tutorial
- Simple scalar
- Present simple future simple past simple exercises
- Present simple past simple future simple
- Present simple present continuous past simple future simple
- Present simple past simple future simple present continuous
- Present simple, past simple, future simple
- Present simple continuous past simple continuous exercises
- Tense chart class 9
- Frases afirmativas simple present
- Future simple present simple
- Vector vs scalar
- Scalar quantity has
- Components of a vector
- Characteristics of vector quantity
- Electric intensity is scalar or vector
- Scalar vs vector
- Scalar kalman filter
- Paralleism
- Scalar vector tensor
- Scalars and vectors
- Vector quantities
- Scalar projection vs vector projection
- Does a scalar have magnitude
- Is projectile motion a scalar or vector
- Gradient of scalar point function
- Displacement must always indicate what
- Vector quantity
- Surface integrals of scalar functions
- Gams kullanımı
- Displacement is a scalar measurement
- Pipelining in computer architecture examples
- Moment couple formula
- Scalar multiplication matrix
- Scalar product of vectors
- Is centripetal acceleration a scalar or vector quantity
- Skew matrix example
- Angular displacement is vector or scalar
- Scalar analysis