HardwareSoftware CoSimulation 4541 633 A So C Design

  • Slides: 35
Download presentation
Hardware-Software Co-Simulation 4541. 633 A So. C Design Automation School of EECS Seoul National

Hardware-Software Co-Simulation 4541. 633 A So. C Design Automation School of EECS Seoul National University

Introduction • Design validation – Simulation (most effective) – Formal verification – Emulation •

Introduction • Design validation – Simulation (most effective) – Formal verification – Emulation • What is HW-SW co-simulation? – Simulation of heterogeneous systems whose HW and SW components are interacting • Why HW-SW co-simulation? – Verification of system description before system synthesis – Verification of synthesis results – System performance estimation • Issues of HW-SW co-simulation – Time-accuracy – Performance

Abstraction Levels Open Core Protocol (OCP-IP) UTF (Untimed Functional) Message Layer Refine TF (Timed

Abstraction Levels Open Core Protocol (OCP-IP) UTF (Untimed Functional) Message Layer Refine TF (Timed Functional) Partition Software Target Code w/wo OS Hardware TF (Timed Functional) Transaction Layer Refine BCA (Bus Cycle Accurate) Refine RTL (Register Transfer Level) Transfer Layer RTL Layer Co-simulation

Abstraction Levels Message layer module channel module timing resource sharing Transaction layer module( bus)

Abstraction Levels Message layer module channel module timing resource sharing Transaction layer module( bus) module channel module Transfer layer module( bus) clock protocols RTL layer module RTL signals bus channel module clock wires registers module clock module

Timed Co-Simulation • K. Hagen and H. Meyr, “Timed and untimed hardware/software co-simulation: application

Timed Co-Simulation • K. Hagen and H. Meyr, “Timed and untimed hardware/software co-simulation: application and efficient implementation, ” Int. Workshop on Hardware-Software Codesign, Oct, 1993. • Untimed co-simulation – Hardware and software time scales are not synchronized – Good for verifying functional completeness – Correct results for handshaking protocol • Timed co-simulation – Hardware and software time scales are synchronized at each data exchange – Useful for overall performance assessment – Can be used for both handshaking and nonhandshaking protocol

Timed Co-Simulation • How to implement a timed co-simulator – Nano-second accurate co-simulation •

Timed Co-Simulation • How to implement a timed co-simulator – Nano-second accurate co-simulation • Need expensive processor model • Very slow – Cycle-based co-simulation • Need cycle-based processor model • Slow – Synchronization thru software timing estimation • No need of processor simulation model • Fast

Timed Co-Simulation • Software timing estimation – V. Zivojnovic and H. Meyr, “Compiled HW/SW

Timed Co-Simulation • Software timing estimation – V. Zivojnovic and H. Meyr, “Compiled HW/SW cosimulation, ” Proc. DAC, June 1996. target binary simulation compiler C program host binary C compiler Modify the intermediate C program for profiling host binary

Synchronization • IPC (Inter-Process Communication) • Non-IPC (single process) • Lockstep – Synchronize at

Synchronization • IPC (Inter-Process Communication) • Non-IPC (single process) • Lockstep – Synchronize at every step • Conventional synchronization – Lockstep + IPC – Large synchronization overhead • Optimistic – Simulate to optimistic next synchronization point – In case of inconsistency, rollback • Pessimistic prediction – Simulate to pessimistic next synchronization point

Synchronization • Optimistic co-simulation 1. SW starts to run six clk’s optimistically. 2. At

Synchronization • Optimistic co-simulation 1. SW starts to run six clk’s optimistically. 2. At checkpoint t. C, the snapshot of SW simulation is stored. 3. HW starts to run. 4. If HW->SW event occurs before time six, then SW rolls back to t. C and restarts from the stored state. 123 456 45 SW 1 checkpoint HW (est. 3~6 clk’s) IPC overhead 2 3 4 job done time

Synchronization – Hybrid synchronization and message grouping • • S. Yoo, K. Choi, and

Synchronization – Hybrid synchronization and message grouping • • S. Yoo, K. Choi, and Dong S. Ha, “Performance Improvement of Geographically Distributed Cosimulation by Hierarchically Grouped Messages, ” IEEE Trans. on VLSI systems, Oct. 2000. Hybrid Synchronization – To reduce the number of null messages by performing optimistic simulation of some (NOT all) simulator(s). • Hierarchically Grouped Message – To reduce the number of event-carrying messages by grouping the messages.

Synchronization – Motivation encoding another frame of voice data SW send one frame of

Synchronization – Motivation encoding another frame of voice data SW send one frame of encoded voice data to HW MS receive demodulated data from HW HW-->SW interrupt to notify data arrival HW BS receive modulated data from MS send modulated data to MS HW (ASIC) Call processor Base Station Rx Tx channel model Tx Rx SW (m. P) Call processor QCELP vocoder Mobile Station

Synchronization • Synchronization overhead by event-carrying messages and null messages SW HW a large

Synchronization • Synchronization overhead by event-carrying messages and null messages SW HW a large # of event-carrying messages a large # of null messages to detect the occurrence of interrupt a large # of event-carrying messages

Synchronization • Basic concept of hybrid synchronization 1 2 3 4 5 SW 1

Synchronization • Basic concept of hybrid synchronization 1 2 3 4 5 SW 1 2 3 4 HW time Synchronization overhead Lock-step Synchronization 123456 optimistic simulator job done optimistic simulation 456 SW conservative HW simulator Synchronization overhead check point 1 2 3 4 simulation speed up Hybrid Synchronization time

Synchronization – Further improvement by message grouping • Message grouping reduces # of event-carrying

Synchronization – Further improvement by message grouping • Message grouping reduces # of event-carrying messages • Simulator defers sending messages of a group until all the messages in the group are ready. 11 7 3 HW 4 2 8 6 10 255 12 3 HW 256 254 SW 2 time 1 5 9 4 253 time SW 1

Synchronization – Experiment • Simulators – MS SW part (opt, con) : ARMulator or

Synchronization – Experiment • Simulators – MS SW part (opt, con) : ARMulator or ARM 710 A@16 MHz – MS HW part + BS parts (con) : Ptolemy • Uni-processor Case – ARMulator + Ptolemy on an Ultra I workstation • Distributed Cases – ARMulator and Ptolemy on two Ultra I workstations – Ptolemy @ Ultra I <--> ARM 710 A @ board (PC) Processor Execution Server Ptolemy n. MREQ MEM m. P FPGA Prototyping Board Address bus 0 x 1400 0 x 8004 Data bus 0 x 2 0 xef 44 0 xef 80

Synchronization • Snapshot of Ptolemy Run

Synchronization • Snapshot of Ptolemy Run

Synchronization

Synchronization

Synchronization – Experimental Results *ISS stands for ARMulator in this table. *All runtimes are

Synchronization – Experimental Results *ISS stands for ARMulator in this table. *All runtimes are in seconds.

Synchronization • Synchronization overhead reduction by pessimistic prediction – J. Jung, S. Yoo, and

Synchronization • Synchronization overhead reduction by pessimistic prediction – J. Jung, S. Yoo, and K. Choi , “Performance Improvement of Multi-Processor Systems Cosimulation based on SW Analysis, ” Proc. DATE 2001, March 2001. Conventional synchronization sim 1 sim 2 sim 3 time Proposed sim 1 sim 2 sim 3 pessimistic prediction time

Synchronization – Pessimistic prediction based on SW analysis A: 0 x 0 0 x

Synchronization – Pessimistic prediction based on SW analysis A: 0 x 0 0 x 4 0 x 8 B: 0 xc 0 x 10 0 x 14 0 x 18 0 x 1 c C: 0 x 20 D: 0 x 24 0 x 28 E: 0 x 2 c 0 x 30 0 x 34 5 mov cmp ble ldr add cmp blt b add str mov str b r 2, #3 r 2, r 1 D 11 r 0, [r 13, #0 x 10] r 1, [r 13, #0 xc] r 0, r 1 E 3 C 3 r 2, r 0, r 1 r 2, [r 13, #4] 6 r 0, #1 r 0, [r 13, #8] C A 5 D 5 3 B 11 E 11 6 3 C sync node

Shortest (Longest) Path Problems – Shortest (Longest) Path Problems • Assume no negative (positive)

Shortest (Longest) Path Problems – Shortest (Longest) Path Problems • Assume no negative (positive) cycles + simple path --> NP-complete • Directed edge weighted graph G(V, E, W), source vertex v 0 • Bellman’s equation – path weight s 0 = 0 si = min(sk + wk, i) , i = 1, 2, . . . , n k¹ i • Acyclic – Topological sort (O(|V| + |E|)) – Solve Bellman’s equation in the topological order sk wk, i si

Shortest (Longest) Path Problems • Cyclic – if all weights are positive, use Dijkstra’s

Shortest (Longest) Path Problems • Cyclic – if all weights are positive, use Dijkstra’s algorithm – DIJKSTRA (G(V, E, W)) { s 0 = 0; for (i = 1 to n) si = w 0, i; repeat { select an unmarked vertex vq such that sq is minimal; mark vq foreach (unmarked vertex vi) si = min{si, (sq + wq, i )}; } until (all vertices are marked); } Implementation of priority queue – linear list: O(|V|2 + |E|) – heap: O(|V|log|V| + |E|log|V|) vq wq, i vi

Synchronization – Results Speedup (single workstation) Speedup (two workstations)

Synchronization – Results Speedup (single workstation) Speedup (two workstations)

Co-Simulation Systems • Untimed – D. Thomas, J. Adams, and H. Schmit, “A model

Co-Simulation Systems • Untimed – D. Thomas, J. Adams, and H. Schmit, “A model and methodology for hardware-software codesign, ” IEEE Des. & Test of Comput. , Sep. 1993. – Co-simulation using PLI of Verilog simulator Verilog-XL simulator, and Unix socket Application-specific – Synchronized handshake hardware module with no processor model software process 1 hardware process 2 Bus interface module Unix sockets software process 2 Verilog PLI

Co-Simulation Systems • Simulation with physical implementation – S. Lee and J. Rabaey, “A

Co-Simulation Systems • Simulation with physical implementation – S. Lee and J. Rabaey, “A hardware-software cosimulation environment, ” Proc. Int. Workshop on Hardware-Software Codesign, Oct, 1993. Ethernet Custom Board UNIX Vx. Works sock. Port Interface Library msg. Send() msg. Receive() Server Process Remote Process Wrapper Simulation (Ptolemy) Workstation Single-Board Computer Remote Process Interface Library msg. Send() msg. Receive()

Co-Simulation Systems • Cosimulation of HW, SW, and Electromechanical parts – N. Petrellis, A.

Co-Simulation Systems • Cosimulation of HW, SW, and Electromechanical parts – N. Petrellis, A. Birbas, M. Birbas, E. Mariatos, and G. Papadopoulos, “Simulating hardware, software and electromechanical parts using communicating simulators, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – Generic processor description + front end specific processor adaptor data bus addr read write GPD application program sen_act time LATCH FL 2 BITV 2 FL RAM FESPA I/O LATCH INTER_CTR strb sen_val act_val INTER error time or VHDL processor model PROUESSE electromechanical systems simulator

Co-Simulation Systems • Virtual Emulation – Borgatti, R. Rambaldi, G. Gori, and R. Guerrieri,

Co-Simulation Systems • Virtual Emulation – Borgatti, R. Rambaldi, G. Gori, and R. Guerrieri, “A smoothly upgradable approach to virtual emulation of HW/SW systems, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – Virtual Emulation = Virtual System (software simulation) + Prototype System (hardware emulation) scheduler Prototype System (emulation) Virtual System (simulation) interface FPGA pod

Co-Simulation Systems Simulator network Scheduler software interface SCSI bus Pod (xilinx) hardware interface Aptix

Co-Simulation Systems Simulator network Scheduler software interface SCSI bus Pod (xilinx) hardware interface Aptix FPCB prototyping board Pod (xilinx) hardware interface

Co-Simulation Systems • Seamless CVE, Mentor Graphics – R. Klein, “Miami: A hardware software

Co-Simulation Systems • Seamless CVE, Mentor Graphics – R. Klein, “Miami: A hardware software co-simulation environment, ” Proc. Int. Workshop on Rapid System Prototyping, June 1996. – M. Shobaki, "Verification of embedded real-time systems using hardware/software co-simulation, " Proc. 24 th EUROMICRO Conference, Aug. 1998. – Memory image server Application code Debugger interface Instruction set simulator Data-space state Interrupts Configuration manager Co-simulation kernel Hardware simulation interface Hardware simulation kernel Data accesses Instruction timing BIM state Bus cycle requests Bus interface models Hardware design Memory models

Co-Simulation Systems – RTL modeling ISS Processor (ARM 9) Decoder BUS Interface BUS (AMBA

Co-Simulation Systems – RTL modeling ISS Processor (ARM 9) Decoder BUS Interface BUS (AMBA AHB) HDL Simulation at the RTL Memory HW (DCT)

Case Study • Simulation of networked Bluetooth devices – Y. Ahn, D. Kim, S.

Case Study • Simulation of networked Bluetooth devices – Y. Ahn, D. Kim, S. Lee, S. Park, S. Yoo, K. Choi, and S. -I. Chae, “An Efficient Simulation Environment for the Design of Networked Bluetooth Devices, ” Proc. Designers' Forum, DATE 2002, March, 2002 GCT Proprietary 32 bit RISC MCU (B@Pico, 32 MIPS) FSK/Clock Controller I-Cache XMEM 1 MEM 8 KB (1 KBx 32) IF XMEM 0 (4 KBx 32) MMU Packet System Controller registers Modulator/ Bit Processor Timer Demodulator(FEC, HEC, CRC, . . ) V 6 PB Bridge Host I/F Debug I/F GPIO UART SPI 20 bit address, 16 bit data HOST I/O SRAM Flash (2 MB) Application RFComm SDP 8 bit HOST HCI serial PC GCT Proprietary CMOS-RF USB Audio Power serial Audio Codec I/F Codec Manage. Backplane(BP) L 2 CAP HCI Audio Interrupt handler Simulation model LM ISS HW Simulator Link Control Baseband GDM 1101 Protocol Stack C model

Case Study – Bluetooth network • Piconet • Scatternet Master Piconet Slave Scatternet

Case Study – Bluetooth network • Piconet • Scatternet Master Piconet Slave Scatternet

Case Study – Case 1: Per-device modular simulation model Host 0 Backplane (Air Channel)

Case Study – Case 1: Per-device modular simulation model Host 0 Backplane (Air Channel) BP 1 ISS BP 2 HW Host 1 (DEV 1) ISS HW Host 2 (DEV 2) Modular and scalable but high simulator synchronization overhead

Case Study – Case 2: Air-channel model in HW simulation Host 0 A HW

Case Study – Case 2: Air-channel model in HW simulation Host 0 A HW simulator Air channel model HW HW BP 1 BP 2 ISS Host 1 (DEV 1) Host 2 (DEV 2) Low socket overhead, fast simulation, non-scalable

Case Study – Scalability 3500 3000 2500 2000 1500 1000 500 0 Air-channel model

Case Study – Scalability 3500 3000 2500 2000 1500 1000 500 0 Air-channel model in HW simulation (Case 2) Simulation runtimes(sec) Per-device modular simulation with GM (Case 1) 3500 3000 2500 2000 1500 1000 500 2 3 4 Number of devices 1 M cycles 2 M cycles 5 M cycles