MARSS MicroArchitecture System Simulator for x 86 CAPS




















- Slides: 20
MARSS – Micro-Architecture & System Simulator for x 86 CAPS Group @ SUNY Binghamton Presenter Avadh Patel http: //marss 86. org
2 Future Computing Systems • Server/Desktop Space: –Integrated Systems – Many modules in one chip –IO bound software applications –Hardware-Software co-design • Mobile Space: –So. C will integrate more modules –Operating System will become more complex • Researchers and Designers will require more powerful tools for innovative designs MARSSx 86 (http: //marss 86. org)
3 Motivation for new Simulator PTLsim. X – Xen based Simulator Customized Xen Hypervisor PTLsim ROM Image Simulation • CPU Pipeline Model • Cache/Interconnect Models • Guest Memory Management • Guest Page Table Management • Memory Allocation Support Code • Host Memory Management • Custom libraries for STDIO etc. • Runs in ring-0 makes system unstable and insecure • Requires customized kernel and hypervisor setup • Difficult to communicate MARSSx 86 (http: //marss 86. org)
4 QEMU + PTLsim • Full user-space based CPU Model application • Every module of Guest is Simulator Emulator emulated, allows full (PTLsim) control • Memory Management Unit • Easy to setup and scale • Timer and Interrupt Unit on cluster of machines • Page table Unit • Allows to use existing Emulated Virtual Disk User precompiled libraries IO devices Module Interface • MARSS switch between emulator and simulator to MARSSx 86 (http: //marss 86. org) reuse components QEMU
5 Simulation Framework Software Stack CPU Model Simulated Hardware User-Space Applications/Benchmarks Shared Libraries Services Operating System MMIO DRAM Emulator Simulator IO Devices Disk, USB, PCI etc. C 0 C 1 C 2 C 3 MLC MLC Interconnect Shared Cache / DRAM MARSSx 86 (http: //marss 86. org) Shared Device Models between CPU Models
6 Key Features • True Full-System simulation • Hardware-Software co-design • Co-simulation methodology • Various Core performance models like Out-of. Order, Atom, Multi-threaded etc. • Easy configuration mechanism to simulate Heterogeneous configurations MARSSx 86 (http: //marss 86. org)
7 Heterogeneous Core Modeling • Highly configurable Cores and Memory Models Oo Oo PC PC MT MT MT PC Oo Oo PC PC Interconnect Shared Cache / DRAM Multicore Configuration MT-Multicore Configuration Oo In. C PC MT In. C MT PC In. C PC Interconnect Shared Cache / DRAM Hybrid Core Configuration 1 Hybrid Core Configuration 2 Oo : Out-Of-Order Core MT : Mutli-Threaded core In. C : In-Order core PC : Private Cache
8 Simulator Performance SPECInt 2006 Benchmarks running with Test input (Full Application run) SPLASH-2 Benchmarks (Full Application runs) 4 -Core 8 -Core Instructions commits per second in Thousands 2 -Core • Native System Configuration: Quad-Core Intel Xeon X 5550 @ 2. 67 GHz (Nehalem) with 8 GB RAM • Error-bars show Maximum and Minimum Speed in KIPS MARSSx 86 (http: //marss 86. org)
9 Case Studies on-chip FIFO –Dedicated hardware cache for incoming network packets IPC • On-Chip Network FIFO • Statistics of Regions of Interests –Detail statistics of ‘compress’ and ‘decompress’ functions –Collected in single simulation run MARSSx 86 (http: //marss 86. org) Network Cache
1 0 Q&A • Grab a copy to hack from: http: //marss 86. org • Open-Source under GPL-2 License • Growing community of users/developers: 107 registered users • Send your comments/questions to apatel@cs. binghamton. edu MARSSx 86 (http: //marss 86. org)
1 1 Backup MARSSx 86 (http: //marss 86. org)
1 2 MARSSx 86 – An Overview Simulated Software Stack User-Space Applications Shared Libraries Services Operating System Memory Management Emulation Simulation CPU Model DRAM IO Devices IO-Mem Disk Marss Framework MARSSx 86 (http: //marss 86. org) User Interface
1 3 MARSSx 86 − Key Features • True Full System –Not only ‘Kernel’ space simulation but also real-time IO simulation –Supports running unmodified OSes –Simulate real multi-threaded workloads –Real-Time IO simulation allows quick modeling of new IO devices and their simulation MARSSx 86 (http: //marss 86. org)
1 4 MARSSx 86 – Key Features • Hardware Software Co-Design –Simulates full stack of software –Communicate between software and Simulated Hardware –No special requirement to build software for MARSS • Co-Simulation –Emulation and Simulation Model in one framework –Seamless switch between two models –Fast-Fwd to interesting regions of benchmarks MARSSx 86 (http: //marss 86. org)
1 5 MARSSx 86 – Key Features • Heterogeneous Core Modeling • Performance models for –Aggressive out-of-order design with RISC substrate –In-order cores like Atom –Multi-Threaded core design for both out-of-order and in-order models • Simulate mix-n-match of different types of core models MARSSx 86 (http: //marss 86. org)
1 6 More Features • Easily integrate external modules –DRAMSim 2 –System. C –Phase-Change RAM (PCRAM) • New statistics framework –enables separate collection of statistics from different regions –Collects separate statistics for Kernel and User space –Separate statistics collection of user specific ROI MARSSx 86 (http: //marss 86. org)
1 7 Simulator Performance • One of the most important requirement • Marss runs cycle accurate simulation in range of 400 to 200 KIPS • Fast simulation allows users to test wide ranges of benchmark behavior in one simulation run MARSSx 86 (http: //marss 86. org)
1 8 Technical Details – Performance Model • Based on PTLsim (older x 86 simulator from CAPS Group) –Components used from PTLsim: Decoder, Core components of Out-Of-Order Datapath, Super. STL and Logic libraries • Fast models for coherent cache, memory system • Several added optimizations for performance, correctness and flexibility MARSSx 86 (http: //marss 86. org)
A Case Study – Benchmark Regions IPC per 100 Million Cycles astar Cycles in billions IPC per 100 Million Cycles 1 9 Bzip 2 Cycles in billions MARSSx 86 (http: //marss 86. org)
Technical Details – Benchmark IPC per 100 Million Cycles mcf Cycles in billions gcc IPC per 10 Million Cycles 2 0 MARSSx 86 Cycles (http: //marss 86. org) in Millions Regions