Logic Simulation Languages Algorithms Simulators Wolfgang Roesner Verification

  • Slides: 46
Download presentation
Logic Simulation : Languages, Algorithms, Simulators Wolfgang Roesner Verification Tools Development IBM Corp Austin,

Logic Simulation : Languages, Algorithms, Simulators Wolfgang Roesner Verification Tools Development IBM Corp Austin, TX

Outline n Hardware Design Languages n Modeling Levels - A Taxonomy of HDL Constructs

Outline n Hardware Design Languages n Modeling Levels - A Taxonomy of HDL Constructs n General Purpose HDL Simulators - Event-Driven Sim n Improving Simulator Performance n Synchronous Design Methodology and Cycle-Based Sim n Let's Design a Cycle-Based Simulator

Simulation of Hardware Design Languages n n Logic or "functional" simulation today is done

Simulation of Hardware Design Languages n n Logic or "functional" simulation today is done mostly with HDLs (Hardware Design Languages) Most popular languages today (both are IEEE standards) è è n Verilog: è è è n Verilog VHDL logic modeling and simulation language started in EDA industry (start-up) in the 80's was acquired by Cadence donated to IEEE as a general industry standard approx. 60% market share in U. S. EDA market VHDL: è è è committee-designed language contracted by U. S. (Do. D) (ADA-derived) functional/logic modeling and simulation language approx. 40% market share in U. S. EDA market

Modeling Levels - Highest Level : Interface n Let's look at the common constructs

Modeling Levels - Highest Level : Interface n Let's look at the common constructs we use to specify the functionality of a piece of hardware: Model inputs outputs input behavior over time output behavior over time t

Modeling Levels - Major Dimensions (I) n Temporal Dimension: è è è n continous

Modeling Levels - Major Dimensions (I) n Temporal Dimension: è è è n continous (analog) gate delay (psec? ) clock cycle instruction cycle events discrete time Data Abstraction: è è è continuous (analog) bit : multiple values bit : binary abstract value composite value ("struct") discrete value

Modeling Levels - Major Dimensions (II) n Functional Dimension: è è è n Structural

Modeling Levels - Major Dimensions (II) n Functional Dimension: è è è n Structural Dimension: è è è n continuous functions (e. g. differential equations) Switch-level (transistors as switches) Boolean Logic Algorithmic (eg. sort procedure) Abstract mathematical formula (e. g. matrix multiplication) Single black box Functional blocks Detailed hierarchy with primitive library elements A good VHDL-centric taxonomy can be found at: http: //rassp. scra. org

Modeling Levels - Major Dimensions (III) Temporal Continuous Gate Delay Clock Cycle Instruction Cycle

Modeling Levels - Major Dimensions (III) Temporal Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit abstract value "struct" Switch Level Boolean Logic Algorithmic Abstract Mathematical Functional Blocks Detailed Component Hierarchy Data Continuous Functional Continous Structural Single Black Box

Coverage of Modeling Levels - Verilog Temporal Continuous Gate Delay Clock Cycle Instruction Cycle

Coverage of Modeling Levels - Verilog Temporal Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit abstract value "struct" Switch Level Boolean Logic Algorithmic Abstract Mathematical Functional Blocks Detailed Component Hierarchy Data Continuous Functional Continous Structural Single Black Box

Coverage of Modeling Levels - VHDL Temporal Continuous Gate Delay Clock Cycle Instruction Cycle

Coverage of Modeling Levels - VHDL Temporal Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit abstract value "struct" Switch Level Boolean Logic Algorithmic Abstract Mathematical Functional Blocks Detailed Component Hierarchy Data Continuous * Functional Continous Structural Single Black Box * extremely inefficient compared to Verilog

In HDL Terms n n VHDL, Verilog were both defined as simulation languages Big

In HDL Terms n n VHDL, Verilog were both defined as simulation languages Big emphasis on the structural refinement l l n VHDL: entity/architecture/component/port/signal Verilog: module/instance/port/signal/reg Specification of function è General: programming language constructs l l è Parallelism: l l è VHDL : user-defined data type, package, procedure, function, sequential code Verilog: function, task, sequential code VHDL : process, signal update, wait Verilog: "always" block, fork/join, wait, event construct Special purpose H/W constructs l l VHDL : concurrent assignment, delayed assignment, signal, Boolean logic Verilog : continuous assignment, 4 -value Boolean logic, switch-level support

General HDL Simulators n Before we can look at architectures of simulators we need

General HDL Simulators n Before we can look at architectures of simulators we need to understand the execution model that the HDLs imply: Event-Driven Execution n n VHDL's execution model is defined in detail in the IEEE LRM (Language Reference Manual) Verilog's execution model is defined by Cadence's Verilog -XL simulator ("reference implementation")

An event-driven VHDL example process (trigger) begin if (count<=15) then count <= count +

An event-driven VHDL example process (trigger) begin if (count<=15) then count <= count + 1 after 1 ns; else count <= 0 after 1 ns; end if; process (count) begin my_count <= count; trigger <= not trigger; end process; Each process: - loops forever - waits for change in signal from other process Block 1 Block 2

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor a b s(0) sum_out(1) s(1) => xor sum_out(0) c(0) and carry_out or and =>

A more hardware-oriented example Let's simulate: a=11 b=01 Time = 0 ns (step 1)

A more hardware-oriented example Let's simulate: a=11 b=01 Time = 0 ns (step 1) s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; Red Boxes : evaluate in current step sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 1 a b xor s(0) sum_out(1) s(1) => xor 1 sum_out(0) c(0) and carry_out or 1 and or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 0 ns (step 2) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 1 a b xor s(0) 1 sum_out(1) s(1) => xor 1 sum_out(0) c(0) and carry_out or 1 and or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 0 ns (step 3) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 1 a b xor s(0) 1 s(1) sum_out(1) 1 => xor 1 sum_out(0) c(0) and carry_out or 1 and or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 0 ns (step 4) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 1 a b xor s(0) 1 s(1) sum_out(1) 1 1 => xor 1 sum_out(0) c(0) and carry_out or 1 and or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 1 ns (step 1) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 xor 1 a 1 s(0) s(1) 1 => xor 1 c(0) and b 1 sum_out(1) 1 1 1 sum_out(0) and carry_out or 1 and 1 or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 1 ns (step 2) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 xor 1 a 1 s(0) s(1) 0 => xor 1 c(0) and b 1 sum_out(1) 1 1 1 and sum_out(0) 1 carry_out or 1 and 1 or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 1 ns (step 3) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 xor 1 a 1 s(0) s(1) 0 => xor 1 c(0) and b 1 sum_out(1) 0 1 1 and sum_out(0) 1 or 1 and 1 1 carry_out or and =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 1 ns (step 4) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 xor 1 a 1 s(0) s(1) 0 => xor 1 c(0) and b 1 sum_out(1) 0 1 1 and sum_out(0) 1 or 1 and 1 1 1 or and carry_out =>

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <=

A more hardware-oriented example s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; Let's simulate: a=11 b=01 Time = 1 ns (step 5) s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); 0 xor 1 a 1 s(0) s(1) 0 => xor 1 c(0) and b 1 sum_out(1) 0 1 1 and sum_out(0) 1 or 1 and 1 1 1 or and carry_out 1 =>

A more hardware-oriented example Let's simulate: a=11 b=01 Time = 1 ns (step 5)

A more hardware-oriented example Let's simulate: a=11 b=01 Time = 1 ns (step 5) s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns; That's enough - you get the point sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); The statement-(process-) dependencies define a network. carry_out Changes are dynamically propagated the network 0 0 1 0 c(1) through xor 1 a => xor 1 c(0) and b 1 s(0) => xor 1 1 and sum_out(0) 1 or 1 and 1 1 1 or and carry_out 1 =>

Event-Driven Simulation n The simulator maintains è è è n a list of all

Event-Driven Simulation n The simulator maintains è è è n a list of all atomic executable blocks a data structure that represents the interconnect of the block via signals a value table that holds all current signal values At start time the simulator schedules all executable blocks of the models

Core-Architecture of an Event-Driven Simulator Block Code Select next scheduled block Schedule Signal Updates

Core-Architecture of an Event-Driven Simulator Block Code Select next scheduled block Schedule Signal Updates More Blocks? Signal - Updates Scheduler Data - Signal Sink Lists - Event Queue More Blocks? Incr Time Done

Improving Simulation Speed n n The most obvious bottle-neck for functional verification is simulation

Improving Simulation Speed n n The most obvious bottle-neck for functional verification is simulation throughput There are two basic ways to improve throughput è è n Simulator performance Running many simulations in parallel Parallelization è Hard : parallel simulation algorithms l l l è much parallel event-driven simulation research has not yielded a breakthrough hard to compete against "trivial parallelization" Simple: run independent testcases on separate machines l l l Workstation "Sim. Farms" 100 s - 1000 s of engineer's workstations run simulation in the background ideal parallelization factor

Improving Simulator Performance (I) n Full-HDL support è If full cover of VHDL/Verilog is

Improving Simulator Performance (I) n Full-HDL support è If full cover of VHDL/Verilog is important è Optimizing compiler techniques l l treat sequential code constructs like general programming language all optimizations for language compilers apply: – – – l l l data/control-flow analysis global optimizations local optimizations (loop unrolling, constant propagation) register allocation pipeline optimizations etc. Global optimizations are limited because of model-build turn-around time requirements Example: modern microprocessor is designed w/ ~1 Million lines of HDL Imagine the compile time for a C-program w/ 1 M lines!

Improving Simulator Performance (II) n Full-HDL support è Better scheduling algorithms l è scheduling

Improving Simulator Performance (II) n Full-HDL support è Better scheduling algorithms l è scheduling is clearly the bottle-neck in all event-simulators Use hybrid techniques: l l l some of the simplifications discussed in the following can be applied to localized "islands" of HDL. requires an HDL compile process that automatically analyzes the structure of the model and uses "speed-up" modes for sub-partitions Problem: – – assume 50% of the model has such islands even if we could speed up simulation of those parts to take 0 time, we would only gain a speedup factor of 2 x

Improving Simulator Performance (III) n Simplifications è Use higher-level HDL specification s(0 to 2)

Improving Simulator Performance (III) n Simplifications è Use higher-level HDL specification s(0 to 2) <= ('0' & a (0 to 1)) + ('0' & b(0 to 1) ); sum_out(0 to 1) <= s(1 to 2); carry_out <= s(0); vs. s(0) <= a(0) xor b(0) after 2 ns; c(0) <= a(0) and b(0) after 1 ns; s(1) <= a(1) xor b(1) xor c(0) after 2 ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1 ns;

Improving Simulator Performance (IV) n Principles of using higher-level HDL specification è Common theme:

Improving Simulator Performance (IV) n Principles of using higher-level HDL specification è Common theme: cut down of number of scheduled events l l create larger sections of un-interrupted sequential code use less fine-grain granularity for model structure – l l use higher-level operators use zero-delay wherever possible – è methodology implications: timing verification is not done together with functional simulation Data abstractions l use binary over multi-value bit values – l n -> smaller number of schedulable blocks multi-value : use only for bus contention situations to resolve several drivers with different strengths (strong, resistive, high-impedance) use word-level operations over bit-level operations NEXT : Most Powerful - methodology-based subset of HDLs

Synchronous Design Methodology n n Clock the design only so fast the longest possible

Synchronous Design Methodology n n Clock the design only so fast the longest possible combinational de path settles before cycle is over Cycle time depends on the longest topological path è n Hazards/Races do not disturb function Longest topological path can be analytically calculated w/o using simulation -> stronger result w/o sim patterns clock : dependent on critical path xor and

Logic Design Groundrules n n n Synchronous design LSSD : design for testability Critical

Logic Design Groundrules n n n Synchronous design LSSD : design for testability Critical delay path defines the clock frequency Behavioral Function and Timing Correctness can be verified independently Design can be verified independently of its implementation -> Logic Synthesis -> Custom Design -> Synthesizable, high-level HDL as main vehicle for functional verification IF: Boolean Equivalence Checking proofs closure between functional an implementation view Functional Verification can use zero-delay functional simulation --> Cycle-Based Simulation, FSM-based Formal Verification

Functional View of Synchronous Designs n n n Logic mapped to a non-cyclic network

Functional View of Synchronous Designs n n n Logic mapped to a non-cyclic network of Boolean functions Network also constains state-holding primitives : latch/reg/array HDL does not contain any timing information Function can be evaluated by zero-delay signal evaluation --> Speed of Simulation + Simplicity of Tools Latches Arrays Boolean Logic Network

Cycle Simulation Algorithm Latches Logic is ordered into levels so that order of evaluation

Cycle Simulation Algorithm Latches Logic is ordered into levels so that order of evaluation is correct. E. g. , A and B are computed before C. Levelized Combinational Logic A C B

Why Is This Faster? As an example, let's assume we compile our circuit into

Why Is This Faster? As an example, let's assume we compile our circuit into a stream of pseudo instructions. . xor a b s(0) sum_out(1) s(1) => xor sum_out(0) c(0) and carry_out or and or => and Load temp 1, a(0) Load temp 2, b(0) Xor temp 1, temp 2, temp 3 Store temp 3, s(0) And temp 1, temp 2, temp 3 Store temp 3, c(0) Load temp 1, a(1) Load temp 2, b(1) Xor temp 1, temp 2, temp 3 Load temp 4, c(0) Xor temp 3, temp 4, temp 5 Store temp 5, s(1) We can cover every Boolean function into a minimal set (~4 or better) instructions

Methodology Flow RTL (VHDL, Verilog) Physical VLSI Design Tools / Custom Design Language Compile

Methodology Flow RTL (VHDL, Verilog) Physical VLSI Design Tools / Custom Design Language Compile Model Build Cycle-Based Simulation Model Formal Verification : Boolean Equivalence Check VERITY Cycle-Based Simulator

Network of Boolean Equations/Operators

Network of Boolean Equations/Operators

Storage Elements n Latch Inference for sequential HDL è è if there is a

Storage Elements n Latch Inference for sequential HDL è è if there is a possible set of signal evaluations that leave the value of a signal undefined -> that signal is assumed to be a storage element Verification & Logic Synthesis must interpret HDL the same way (check with Boolean Equivalence Checking)

Implementation Options For Cycle Sim n HDL Compilation n Evaluation Sequence n Function Evaluation

Implementation Options For Cycle Sim n HDL Compilation n Evaluation Sequence n Function Evaluation

HDL Compilation Options For Cycle Sim n Preserve the HDL structures è è è

HDL Compilation Options For Cycle Sim n Preserve the HDL structures è è è n Compile HDL like a programming language Preserve design hierarchy, processes, modules, functions, procedures Implementation process very similar to programming language compile Incremental processing is trivial Model optimization is hard (cross-functional boundary) and limited Map HDL to library of primitive functions (e. g. IBM "Texsim") è è Crush design hierarchy to increase optimization potential Synthesis-like process, but simpler because of missing physical constraints Incremental model build is very hard Designer view of hierarchy must be preserved for model debug proces

Evaluation Sequence Options For Cycle Sim n Oblivious Evaluation è è è n Event-Driven

Evaluation Sequence Options For Cycle Sim n Oblivious Evaluation è è è n Event-Driven Evaluation è è è n For every cycle, evaluate all Boolean logic Minimal amount of book-keeping and runtime control data structures Large amount of redundant evaluation for large models with low chang activity Evaluate changes only Minimize redundant work for low-activity models Book-keeping overhead (But: use synchronicity constraint) Hybrid Techniques (example: Texsim) è è Use design partitions as base granularity for event-driven evaluation Use “key controling signals” as guards for event-driven evaluation

Function Evaluation Options For Cycle Sim n Interpreted è è è n Map design

Function Evaluation Options For Cycle Sim n Interpreted è è è n Map design network into an efficient data structure which a simulator “walks” at run-time “Model” is data, highly portable Few successfull examples where interpretation ofn a datastructure didn’t add significant performance loss due to indirection Compiled (example: Texsim) è è Map design network into a sequence of instructions Generating “C-source” is not an industrial-strength option “Model” consists of machine-instructions, platform dependent Many programming language compiler optimizations apply, but the problem is simpler (more constrained)

IBM's Texsim simulator n n Flattening of the hierarchy Network optimizations - using levelized

IBM's Texsim simulator n n Flattening of the hierarchy Network optimizations - using levelized network è è n n Constant propagation De. Morgan’s law and other Boolean optimizations Equivalent function removal Merging of functions with only one fanout Final levelization Structural analysis and logic partitioning (e. g. latch-partitioning, below Create model symbol table and allocate value table Compile logic by partition and level è è Generate reference history for use by register allocation Generate machine independent code & perform peephole optimization Perform instruction scheduling Generate target machine object code

Orders of Magnitude in Speed n n These numbers are supposed to give you

Orders of Magnitude in Speed n n These numbers are supposed to give you intuitive feel: Event Simulator 1 Cycle Simulator 20 Event driven cycle Simulator 50 Acceleration 1000 Emulation 100000 The actual numbers depend on many other factors: è è è Model activity rate Test bench activity & implementation language Multiple clock domains

We could write a simulator now. . . n But we have not talked

We could write a simulator now. . . n But we have not talked about many areas: è è è Model-Build software principles (how to make gigantic model in minutes) Simulator user interface (how to talk to user and to programs) How a cycle simulator deals with multi-value signals l è do we need those ? Yes, mainly for bus logic and power-on-reset sim How do we take the cycle-sim algorithm and implement in special purpose hardware : simulation acceleration, emulation è Where is there still a need for pure event-simulation? è Good research: only few papers in the field during the last 10 years l Good place to start is Peter Maurer at Florida State. Extremely interesting for folks who want to explore different algorithms

And. . . n This field is just one segment of verification tools development

And. . . n This field is just one segment of verification tools development which is just one segment of Electronic Design Automation