Simulation Overview Multifacet Group University of WisconsinMadison 11302020










































- Slides: 42
Simulation Overview Multifacet Group University of Wisconsin-Madison 11/30/2020
Overview • Technical introduction to: Simics, Ruby, Opal • Simics: Full-System Simulator • Ruby: Memory Timing Simulation • Opal: Out-of-order Micro-architecture Simulator 11/30/2020
Outline • I. Overview • II. Simics introduction • III. Ruby – A. Simics interfaces – B. Software architecture • IV. Opal – A. Software architecture 11/30/2020
Simics • Full system multi-processor simulator • Simulated target: SPARC V 9 (E 15 k-like) • Nice Features: – documentation – checkpoints – disk images • Scripting in python 11/30/2020 . /simics/doc. /simics/checkpoints
Simics Devices (sun 4 u) MMU Ultra. Sparc iii Memory Bus RAM I/O MMU Controller System Status Registers Real Time Clock Serial Port DMA Controller IRQ Controller Terminal PCI Bus Graphics Card CDROM 11/30/2020 Ethernet Controller SCSI Disk SCSI Controller … SCSI Disk Fiber Channel Controller
Simics Timing Model • One instruction = one cycle – Modulo interrupts, traps – Cycle time is determined by clock frequency simics> print-event-queue (“peq cpu 0”) CPU Step Queue Time Queue 11/30/2020
Outline • I. Overview • II. Simics introduction • III. Ruby – A. Simics interfaces – B. Software architecture – C. CMP overview • IV. Opal – A. Software architecture • V. Bibtex 11/30/2020
Ruby Introduction • Models timing for caches, memory, interconnect and directories • Implements multiple cache coherence protocols • Uses event-driven simulation Cache Memory Interconnect Ruby 11/30/2020 Directory
Ruby Timing Model • Queues act as delay centers – 1 cycle between queues CPU try. Cache. Access hit. Callback L 1 Cache Controller TBE: Transaction Buffer Entries One per outstanding memory transaction Response. Msg. h Request. Msg. h Network (L 2 Cache, Directory) 11/30/2020
How to Drive Ruby • Three ways to run ruby: – Random Tester – Simics (only) – Simics + Opal 11/30/2020
Ruby Random Tester • Stand-alone executable • Action-Check pairs – – Massive false sharing Action: write a set of values in a block Check: validate the values are correct Invaluable when developing protocols • Other testers available – Lock contention – Deterministic behavior – Etc… 11/30/2020
Ruby-Simics Interfaces • Timing-Model interface Simics 1. Simics encounters a memory instruction “timing_model” 2. Simics creates memory_trans structure “snoop_device” 3. timing_model interface is called stall time 4. Ruby returns stall time 5. (opt) Ruby changes stall time 6. Simics commits instruction 7. “snoop_device” called to read memory_transaction Ruby 11/30/2020
Simics Memory Interfaces • Timing-Model interface – Provides: memory reference structure • Address, Ld/St, Size, I/O – Ruby returns stall time • Polling interface – Ruby is called every N steps 11/30/2020
SLICC Introduction • SLICC: Specification Language for Implementing Cache Coherence Protocols • Models multiple coherence protocols 11/30/2020
Ruby SW Architecture System Node Driver generated/Node. h common/Driver. h L 1 Cache generated/L 1 Cache_* Tester tester/Tester. h generated/L 2 Cache_* Simics Interface simics/Simics. Driver. h Directory/ Memory generated/Directory_* Opal Interface interface/Opal. Interface. h L 2 Cache 11/30/2020 Network network/* Profiler profiler/Profiler. h
Ruby SW Architecture Node Sequencer. h Caches system/New. Cache. Memory. h Directory system/Directory. Memory. h SLICC Directory Controller State generated/L 1 Directory_Entry. h Cache Line generated/L? Cache_Entry. h L 1 Cache Controller L 2 Cache Controller 11/30/2020 Directory Controller
Where’s Waldo? • Describes the FSM in cache controller • Data Structures – – L 1_Cache. Entry. h L 2_Cache. Entry. h Directory_Entry. h Node. h • Control – L 1_Transitions. h – L 2_Transitions. h – Directory_Transitions. h 11/30/2020 Tag Data Permissions (MSI)
Day in the life of a Request simics “timing_model” simics/src/extensions/ruby. c ruby/simics/Simics. Driver. C make. Request() ruby/system/STD_Sequencer. C make. Request() do. Request() node->L 1 Cache->try. Cache. Access() ruby/system/Cache. Memory. h try. Cache. Access() issue. Request() … hit. Callback() 11/30/2020
Ruby Configuration • ruby/config. include – All parameters defined here • ruby/config/rubyconfig. defaults – Defines parameters for the ruby module – All parameters can be adjusted at runtime • ruby/config/tester. defaults – Defines parameters for the tester 11/30/2020
CMP Overview • Node contains – Exactly one Processor and L 1 I+D Cache • 1 -16 in the system • Partitioned across 1 -16 chips – 0 to N L 2 Cache Banks • At least one per chip – 0 to N Directories • At least one per system • Network – One network connects all components in the system – Composed of switches and point-to-point links 11/30/2020
Outline • • 11/30/2020 I. Overview II. Simics introduction III. Ruby SW architecture IV. Opal SW architecture
Processor Simulator: Opal • Models a R 10000 like out-of-order processor • SPARC V 9 instruction set • Timing-First Organization 11/30/2020
Timing-First Simulation • Timing Simulator – does functional execution of user and privileged operations – does speculative, out-of-order multiprocessor timing simulation – does NOT implement functionality of full instruction set or any devices • Functional Simulator add load Execute Cache CPU Opal 11/30/2020 Network – does full-system multiprocessor simulation – does NOT model detailed micro-architectural timing CPU Commit Verify System RAM Simics
Timing-First Operation • As instruction retires, step CPU in functional simulator • Verify instruction’s execution • Reload state if timing deviates from functional add load Execute Cache Network – Instructions with unidentified side-effects – NOT loads/store to I/O devices CPU Commit Verify CPU Opal 11/30/2020 Reload System RAM Simics
Benefits of Timing-First • Supports speculative multi-processor timing models • Leverages existing simulators • Software development advantages – Increases flexibility and reduces code complexity – Immediate, precise check on timing simulator 11/30/2020
Conclusions • Simics – Functional simulator – Attach timing modules to control execution time • Ruby – Uses generated and non-generated code to simulate the memory system – Extended to simulate CMPs • Opal – Timing first out-of-order processor model – Drives execution 11/30/2020
Backup Slides More Opal Details 11/30/2020
Top Level Interfaces Opal commands hfa. c System system. c sequencer pseq. c Core Simulator API abstraction pstate. c Simics API Emulate simics API simdist 12. c 11/30/2020 Simics module extensions/opal. c Stand alone tester/simmain. c Unified ruby/opal tester/simmain. c Stand alone decoder tester/usd. c Other testers tester/*
Pipeline Overview Branch Predictors Fetch squash Decode Schedule Execute Input Wait LSQ Wait Cache Miss 11/30/2020 Complete Retire
System system/System. C Sequencer pseq. C Statistics sysstat. C Simics Interfaces pstate. C Thread Statistics memstat. C Ruby API mf_api. h Thread Statistics threadstat. C 11/30/2020
Sequencer • • 11/30/2020 Instruction Window Register Files Caches / LSQ / MSHR (or ruby cache intf) Branch Predictors Simics / Checking Routines Micro-architectural checkpointing Instruction / Memory / Branch Tracing
Static Instruction • One static instructions per physical address • Can be cached in instruction pages • Fields of interest: Instruction Mapping ipagemap. c Instruction Pages ipage. c – opcode, type, source / dest registers 11/30/2020
Dynamic Instructions • One dynamic instruction per in-flight instruction – data: renamed registers, events – functional execution – predict & actual program counter Atomic Prefetch Load Store Control operation Memory operation controlop. c memop. c Dynamic instance dynamic. c Static instruction statici. c 11/30/2020 implementation dx. c opcodes decode. c Instruction Set Specific
Instruction Window • All in-flight instructions are tracked • Markers delimit pipeline progress • Implemented using rotating buffer 11/30/2020 * ------------------------ * |D|D|F|F|O|O|O|O|C|C|E|D|D| * ------------------------ * ^ ^ * | | | last_scheduled * | last_fetched last_retired * last_decoded
Abstract Register File (arf) • Instructions treat registers uniformly Control RF Condition Code RF Double Precision RF Single Precision RF allocate Global Integer RF free Global Integer RF rename is. Ready wait check Interfaces 11/30/2020 Integer (Windowed) RF Abstract Register File arf. c Register abstraction regbox. c Logical registers regmap. c Physical registers regfile. c Register Abstraction
Statistics • pseq statistics • observer functions – observe instruction – observe static instruction – observe thread switch – observe transaction complete 11/30/2020
Branch Predictor Overview Simics Opal commands hfa. c System system. c sequencer pseq. c bts - branch trace start btt - branch trace take btf - branch trace finish Branch Trace Stand alone tester/bp. c 11/30/2020
BP Classes Global shared predictor gshare. c Agree predictor agree. c Yet another global predictor Infinite (per-PC) gshare yags. c igshare. c Fetch Execute Dynamic instances dynamic. c next. PC Implementation dx. c Predict, update May rollback Fixup. State() Retire Control instructions controlop. c set. Target Retire 11/30/2020 Direct Branch Predictor Common/directbp. c Sequencer pseq. c
Configuration Files • Files define all micro-architectural parameters – imported as global ALL CAPS variables – “name: value” pairs – found in opal/config • Must load file before running opal! – load-module opal – opal 0. conf filename 11/30/2020
Adding global variables • config. include • config. defaults 11/30/2020
Template for stand-alone opal read-conf. . /checkpoints/oltp-warm-2 p. check cpu 0. print-time @import mfacet @from mfacet import * @magic_enable_cmd() @mfacet. setup_run_for_n_transactions ( 100000 ) module-list-refresh @SIM_get_attribute( SIM_get_object( " sim" ), "cpu_switch_time" ) @SIM_set_attribute( SIM_get_object( " sim" ), "cpu_switch_time", 1 ) @SIM_get_attribute( SIM_get_object( " sim" ), "cpu_switch_time" ) load-module opal load-module ruby opal 0. init opal 0. start /scratch. local/warm-2 p. log opal 0. s 10000000 opal 0. stats opal 0. stop ruby 0. dump-stats /scratch. local/warm-2 p-ruby. log 11/30/2020
Makefile Defines 11/30/2020 • • • PIPELINE_VIS: pipeline visualization output MODINIT_VERBOSE: startup debugging VERIFY_SIMICS: once per new version of simics REDECODE_EACH: disables static instruction caching USE_MINI_TLB: increases performance • Most defines should be variables! Not compile time options.