HardwareSoftware CoSimulation 4541 633 A So C Design



































- Slides: 35
Hardware-Software Co-Simulation 4541. 633 A So. C Design Automation School of EECS Seoul National University
Introduction • Design validation – Simulation (most effective) – Formal verification – Emulation • What is HW-SW co-simulation? – Simulation of heterogeneous systems whose HW and SW components are interacting • Why HW-SW co-simulation? – Verification of system description before system synthesis – Verification of synthesis results – System performance estimation • Issues of HW-SW co-simulation – Time-accuracy – Performance
Abstraction Levels Open Core Protocol (OCP-IP) UTF (Untimed Functional) Message Layer Refine TF (Timed Functional) Partition Software Target Code w/wo OS Hardware TF (Timed Functional) Transaction Layer Refine BCA (Bus Cycle Accurate) Refine RTL (Register Transfer Level) Transfer Layer RTL Layer Co-simulation
Abstraction Levels Message layer module channel module timing resource sharing Transaction layer module( bus) module channel module Transfer layer module( bus) clock protocols RTL layer module RTL signals bus channel module clock wires registers module clock module
Timed Co-Simulation • K. Hagen and H. Meyr, “Timed and untimed hardware/software co-simulation: application and efficient implementation, ” Int. Workshop on Hardware-Software Codesign, Oct, 1993. • Untimed co-simulation – Hardware and software time scales are not synchronized – Good for verifying functional completeness – Correct results for handshaking protocol • Timed co-simulation – Hardware and software time scales are synchronized at each data exchange – Useful for overall performance assessment – Can be used for both handshaking and nonhandshaking protocol
Timed Co-Simulation • How to implement a timed co-simulator – Nano-second accurate co-simulation • Need expensive processor model • Very slow – Cycle-based co-simulation • Need cycle-based processor model • Slow – Synchronization thru software timing estimation • No need of processor simulation model • Fast
Timed Co-Simulation • Software timing estimation – V. Zivojnovic and H. Meyr, “Compiled HW/SW cosimulation, ” Proc. DAC, June 1996. target binary simulation compiler C program host binary C compiler Modify the intermediate C program for profiling host binary
Synchronization • IPC (Inter-Process Communication) • Non-IPC (single process) • Lockstep – Synchronize at every step • Conventional synchronization – Lockstep + IPC – Large synchronization overhead • Optimistic – Simulate to optimistic next synchronization point – In case of inconsistency, rollback • Pessimistic prediction – Simulate to pessimistic next synchronization point
Synchronization • Optimistic co-simulation 1. SW starts to run six clk’s optimistically. 2. At checkpoint t. C, the snapshot of SW simulation is stored. 3. HW starts to run. 4. If HW->SW event occurs before time six, then SW rolls back to t. C and restarts from the stored state. 123 456 45 SW 1 checkpoint HW (est. 3~6 clk’s) IPC overhead 2 3 4 job done time
Synchronization – Hybrid synchronization and message grouping • • S. Yoo, K. Choi, and Dong S. Ha, “Performance Improvement of Geographically Distributed Cosimulation by Hierarchically Grouped Messages, ” IEEE Trans. on VLSI systems, Oct. 2000. Hybrid Synchronization – To reduce the number of null messages by performing optimistic simulation of some (NOT all) simulator(s). • Hierarchically Grouped Message – To reduce the number of event-carrying messages by grouping the messages.
Synchronization – Motivation encoding another frame of voice data SW send one frame of encoded voice data to HW MS receive demodulated data from HW HW-->SW interrupt to notify data arrival HW BS receive modulated data from MS send modulated data to MS HW (ASIC) Call processor Base Station Rx Tx channel model Tx Rx SW (m. P) Call processor QCELP vocoder Mobile Station
Synchronization • Synchronization overhead by event-carrying messages and null messages SW HW a large # of event-carrying messages a large # of null messages to detect the occurrence of interrupt a large # of event-carrying messages
Synchronization • Basic concept of hybrid synchronization 1 2 3 4 5 SW 1 2 3 4 HW time Synchronization overhead Lock-step Synchronization 123456 optimistic simulator job done optimistic simulation 456 SW conservative HW simulator Synchronization overhead check point 1 2 3 4 simulation speed up Hybrid Synchronization time
Synchronization – Further improvement by message grouping • Message grouping reduces # of event-carrying messages • Simulator defers sending messages of a group until all the messages in the group are ready. 11 7 3 HW 4 2 8 6 10 255 12 3 HW 256 254 SW 2 time 1 5 9 4 253 time SW 1
Synchronization – Experiment • Simulators – MS SW part (opt, con) : ARMulator or ARM 710 A@16 MHz – MS HW part + BS parts (con) : Ptolemy • Uni-processor Case – ARMulator + Ptolemy on an Ultra I workstation • Distributed Cases – ARMulator and Ptolemy on two Ultra I workstations – Ptolemy @ Ultra I <--> ARM 710 A @ board (PC) Processor Execution Server Ptolemy n. MREQ MEM m. P FPGA Prototyping Board Address bus 0 x 1400 0 x 8004 Data bus 0 x 2 0 xef 44 0 xef 80
Synchronization • Snapshot of Ptolemy Run
Synchronization
Synchronization – Experimental Results *ISS stands for ARMulator in this table. *All runtimes are in seconds.
Synchronization • Synchronization overhead reduction by pessimistic prediction – J. Jung, S. Yoo, and K. Choi , “Performance Improvement of Multi-Processor Systems Cosimulation based on SW Analysis, ” Proc. DATE 2001, March 2001. Conventional synchronization sim 1 sim 2 sim 3 time Proposed sim 1 sim 2 sim 3 pessimistic prediction time
Synchronization – Pessimistic prediction based on SW analysis A: 0 x 0 0 x 4 0 x 8 B: 0 xc 0 x 10 0 x 14 0 x 18 0 x 1 c C: 0 x 20 D: 0 x 24 0 x 28 E: 0 x 2 c 0 x 30 0 x 34 5 mov cmp ble ldr add cmp blt b add str mov str b r 2, #3 r 2, r 1 D 11 r 0, [r 13, #0 x 10] r 1, [r 13, #0 xc] r 0, r 1 E 3 C 3 r 2, r 0, r 1 r 2, [r 13, #4] 6 r 0, #1 r 0, [r 13, #8] C A 5 D 5 3 B 11 E 11 6 3 C sync node
Shortest (Longest) Path Problems – Shortest (Longest) Path Problems • Assume no negative (positive) cycles + simple path --> NP-complete • Directed edge weighted graph G(V, E, W), source vertex v 0 • Bellman’s equation – path weight s 0 = 0 si = min(sk + wk, i) , i = 1, 2, . . . , n k¹ i • Acyclic – Topological sort (O(|V| + |E|)) – Solve Bellman’s equation in the topological order sk wk, i si
Shortest (Longest) Path Problems • Cyclic – if all weights are positive, use Dijkstra’s algorithm – DIJKSTRA (G(V, E, W)) { s 0 = 0; for (i = 1 to n) si = w 0, i; repeat { select an unmarked vertex vq such that sq is minimal; mark vq foreach (unmarked vertex vi) si = min{si, (sq + wq, i )}; } until (all vertices are marked); } Implementation of priority queue – linear list: O(|V|2 + |E|) – heap: O(|V|log|V| + |E|log|V|) vq wq, i vi
Synchronization – Results Speedup (single workstation) Speedup (two workstations)
Co-Simulation Systems • Untimed – D. Thomas, J. Adams, and H. Schmit, “A model and methodology for hardware-software codesign, ” IEEE Des. & Test of Comput. , Sep. 1993. – Co-simulation using PLI of Verilog simulator Verilog-XL simulator, and Unix socket Application-specific – Synchronized handshake hardware module with no processor model software process 1 hardware process 2 Bus interface module Unix sockets software process 2 Verilog PLI
Co-Simulation Systems • Simulation with physical implementation – S. Lee and J. Rabaey, “A hardware-software cosimulation environment, ” Proc. Int. Workshop on Hardware-Software Codesign, Oct, 1993. Ethernet Custom Board UNIX Vx. Works sock. Port Interface Library msg. Send() msg. Receive() Server Process Remote Process Wrapper Simulation (Ptolemy) Workstation Single-Board Computer Remote Process Interface Library msg. Send() msg. Receive()
Co-Simulation Systems • Cosimulation of HW, SW, and Electromechanical parts – N. Petrellis, A. Birbas, M. Birbas, E. Mariatos, and G. Papadopoulos, “Simulating hardware, software and electromechanical parts using communicating simulators, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – Generic processor description + front end specific processor adaptor data bus addr read write GPD application program sen_act time LATCH FL 2 BITV 2 FL RAM FESPA I/O LATCH INTER_CTR strb sen_val act_val INTER error time or VHDL processor model PROUESSE electromechanical systems simulator
Co-Simulation Systems • Virtual Emulation – Borgatti, R. Rambaldi, G. Gori, and R. Guerrieri, “A smoothly upgradable approach to virtual emulation of HW/SW systems, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – Virtual Emulation = Virtual System (software simulation) + Prototype System (hardware emulation) scheduler Prototype System (emulation) Virtual System (simulation) interface FPGA pod
Co-Simulation Systems Simulator network Scheduler software interface SCSI bus Pod (xilinx) hardware interface Aptix FPCB prototyping board Pod (xilinx) hardware interface
Co-Simulation Systems • Seamless CVE, Mentor Graphics – R. Klein, “Miami: A hardware software co-simulation environment, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – M. Shobaki, "Verification of embedded real-time systems using hardware/software co-simulation, " Proc. 24 th EUROMICRO Conference, Aug. 1998. – Memory image server Application code Debugger interface Instruction set simulator Data-space state Interrupts Configuration manager Co-simulation kernel Hardware simulation interface Hardware simulation kernel Data accesses Instruction timing BIM state Bus cycle requests Bus interface models Hardware design Memory models
Co-Simulation Systems – RTL modeling ISS Processor (ARM 9) Decoder BUS Interface BUS (AMBA AHB) HDL Simulation at the RTL Memory HW (DCT)
Case Study • Simulation of networked Bluetooth devices – Y. Ahn, D. Kim, S. Lee, S. Park, S. Yoo, K. Choi, and S. -I. Chae, “An Efficient Simulation Environment for the Design of Networked Bluetooth Devices, ” Proc. Designers' Forum, DATE 2002, March, 2002 GCT Proprietary 32 bit RISC MCU (B@Pico, 32 MIPS) FSK/Clock Controller I-Cache XMEM 1 MEM 8 KB (1 KBx 32) IF XMEM 0 (4 KBx 32) MMU Packet System Controller registers Modulator/ Bit Processor Timer Demodulator(FEC, HEC, CRC, . . ) V 6 PB Bridge Host I/F Debug I/F GPIO UART SPI 20 bit address, 16 bit data HOST I/O SRAM Flash (2 MB) Application RFComm SDP 8 bit HOST HCI serial PC GCT Proprietary CMOS-RF USB Audio Power serial Audio Codec I/F Codec Manage. Backplane(BP) L 2 CAP HCI Audio Interrupt handler Simulation model LM ISS HW Simulator Link Control Baseband GDM 1101 Protocol Stack C model
Case Study – Bluetooth network • Piconet • Scatternet Master Piconet Slave Scatternet
Case Study – Case 1: Per-device modular simulation model Host 0 Backplane (Air Channel) BP 1 ISS BP 2 HW Host 1 (DEV 1) ISS HW Host 2 (DEV 2) Modular and scalable but high simulator synchronization overhead
Case Study – Case 2: Air-channel model in HW simulation Host 0 A HW simulator Air channel model HW HW BP 1 BP 2 ISS Host 1 (DEV 1) Host 2 (DEV 2) Low socket overhead, fast simulation, non-scalable
Case Study – Scalability 3500 3000 2500 2000 1500 1000 500 0 Air-channel model in HW simulation (Case 2) Simulation runtimes(sec) Per-device modular simulation with GM (Case 1) 3500 3000 2500 2000 1500 1000 500 2 3 4 Number of devices 1 M cycles 2 M cycles 5 M cycles