Automatic Verification of Floating Point Units Udo Krautz

  • Slides: 17
Download presentation
Automatic Verification of Floating Point Units Udo Krautz, Viresh Paruthi, Anand Arunagiri, Sujeet Kumar

Automatic Verification of Floating Point Units Udo Krautz, Viresh Paruthi, Anand Arunagiri, Sujeet Kumar IBMTM Corporation

Authors 1. Udo Krautz, IBM Deutschland, Boeblingen Germany, krautz@de. ibm. com, +49 -7031 -16

Authors 1. Udo Krautz, IBM Deutschland, Boeblingen Germany, krautz@de. ibm. com, +49 -7031 -16 -2347 2. Viresh Paruthi, IBM Corporation, Austin TX USA, vparuthi@us. ibm. com, +1 -512 -286 -7922 3. Anand B Arunagiri, IBM Corporation, Bangalore India, aarunagi@in. ibm. com, +91 -80 -41777187 4. Sujeet Kumar, IBM Corporation, Bangalore India, sujkumak@in. ibm. com, +91 -80 -41777283 2

Abstract Verification of floating point units (FPU) is one of the most successful applications

Abstract Verification of floating point units (FPU) is one of the most successful applications of formal verification methods. The large and complex data paths and intricate control structures of FPUs makes verification with coverage driven simulation incomplete and error prone. Formal verification (FV) has been successfully leveraged to achieve the high level of quality desired of these critical logics. Typically, FV-based approaches to verify FPUs rely on introducing higher level abstractions to allow reasoning. This however has to be done manually, and quickly becomes tedious for highly optimized bit level implementations on board high performance microprocessors. Automated formal methods working directly on the bit level and providing a full end-to-end check for FPUs exist but are limited to single instructions (issued in an empty pipeline), hence lack in checking control aspects of the logic as those relate to inter-instruction interactions, or pipeline control. In this talk we present an approach based on equivalence checking to overcome the single instruction limitation for automated bit level proofs in the formal verification of FPUs. The sequential execution of instructions is modeled by two instances of the design-under-test. One of these instances acts as a reference model for the other. This allows for a large numbers of internal equivalences to be leveraged by equivalence checking techniques. We show that this method is capable of proving instruction sequences for highly optimized industrial FPU designs. Together with a proof of correctness of individual instructions with model checking it guarantees correctness of the FPU design as a whole. In our experience no other approach can provide the level of automation and ease as the proposed method. 3

Motivation • Floating-Point Units (FPU) inherently difficult to verify: • Data path challenges –

Motivation • Floating-Point Units (FPU) inherently difficult to verify: • Data path challenges – Complex floating-point algorithms and hardware E. g. alignment shifter, leading zero anticipator (LZA), rounding, … – Intricate corner-cases E. g. denormal inputs/outputs, cancellation, sticky-bits, … • Control complexity – Pipelined out-of-order speculative execution, microcode ops, . . . • Various verification techniques deployed to verify FPUs • Incomplete methods to find bugs – Rand/manual/targeted testcase generation, coverage analysis, … – Bugs may skip into silicon (e. g. Pentium FP bug!) • Complete methods (formal) to establish correctness – Model checking (automatic) techniques • Restricted to a single instruction issue in an empty pipeline (datapath verif) – Higher level reasoning • Manual with requiring creation of dedicated models (end-to-end verif) 4

Contribution • We propose to enhance automated methods to enable verification of control aspects

Contribution • We propose to enhance automated methods to enable verification of control aspects in addition to the data path • Automated end-to-end verification of bit level FPUs • Inclusive of control and data path – Data path verified with model checking (existing state-of-the-art) • Submit a single instruction in an empty pipeline • Checks for “numerical correctness” of different ops – Control related aspects verified with sequential equivalence checking • The design serves as its own reference • Instruction sequence submitted to allow inter-instruction interactions • Allows leveraging internal equivalence points to alleviate capacity issues • Results bear out effectiveness of the approach 5

Data path Verification • • Checks numerical correctness of FPU data path • •

Data path Verification • • Checks numerical correctness of FPU data path • • • IEEE 754 standard Implementation constraints (timing, area, power, performance) Fused-multiply-add (FMA) instruction: A*B + C • Example bugs: – if two nearly equal numbers subtracted (causing cancellation), the wrong exponent is returned – if result is near underflow, the wrong guard-bit is chosen Restricted to a single instruction issued in an empty FPU • Influence of other instructions not considered • Provides complete datapath coverage; remaining verification resources may focus on other aspects (e. g. , inter-instruction) 6

Datapath Verification Testbench • A “driver” issues an instruction into real, reference FPUs •

Datapath Verification Testbench • A “driver” issues an instruction into real, reference FPUs • A “checker” compares the results of the two FPUs for equality Operands Reference model Real FPU = • FP operations may be bounded by longest-latency operation • Verification problem is thus a bounded model check 7

Control Verification • Verifies pipeline control, complex micro-architectural features • Speculative execution, functional clock-gating,

Control Verification • Verifies pipeline control, complex micro-architectural features • Speculative execution, functional clock-gating, blocking, … • Example bugs: – If a speculatively executed instruction stream should not be executed (e. g. due to branch not taken), does a ‘kill’ generate any side-effects? – Does the issue of overlapping instructions cause resource conflicts? – Does forwarding of data to subsequent instruction yield wrong result? • Requires submission of continuous stream of instructions • • Activate inter-instruction interactions/dependencies Irrespective of previously executed instructions, or initial state 8

Control Verification Testbench • The design serves as its own “reference” • A “driver”

Control Verification Testbench • The design serves as its own “reference” • A “driver” issues single instruction in “reference” FPU and additional sequence of instructions in real FPU • A “checker” compares correct result of “followed” instruction Instruction sequence Single instruction (Real) FPU (Reference) FPU = • Verification problem is a sequential equivalence check • Internal equivalences can be effectively leveraged 9

Conditional Equivalence • A single instruction of the sequence is executed in both FPUs

Conditional Equivalence • A single instruction of the sequence is executed in both FPUs • Restricted to conditional equivalence (not general SEC) • Pipeline stages in which the “followed” instruction is active should be equivalent in a specific cycle Other instruction Inactive stage Active pipeline stage, followed instruction Followed instruction = • Final check only on the result of the “followed” instruction • Bounded checking allows to unfold the pipeline – only equivalent pipeline stages should be in result property‘s COI 10

Sequential Equivalence Tenets • Several degrees of equivalence/correctness: • Identical result of “followed” instruction

Sequential Equivalence Tenets • Several degrees of equivalence/correctness: • Identical result of “followed” instruction regardless of initial state ‒ Possible with model checking if legal initial states are known ‒ Manual computation of initial states tedious for complex pipelines • “Followed” instruction not influenced by “residual states” ‒ Both FPUs should be equivalent for the “followed” instruction irrespective of a previously executed instruction • All timing-windows need to be considered between instructions ‒ Requires an infinite sequence of instructions ‒ Infinite sequence made finite to allow bounded checking 11

Verification Technology • SAT-based Bounded Model Check • Performs a satisfiability check on a

Verification Technology • SAT-based Bounded Model Check • Performs a satisfiability check on a k-step unfolded netlist • Hybrid SAT-engine – Integrates structural netlist transformations, BDDs, simulation, CNF clauses and SAT procedure in one framework • Conditional equivalence checking • Automatic checkers for pipeline stages getting activated ‒ Added for every stage – either proven or disproven • Leveraged as “lighthouses” to enable end-to-end SAT check • Encapsulated as engines in IBM’s semi-formal tool Sixth. Sense • Uses a Transformation Based Verification (TBV) paradigm that maximally exploits synergy between algorithms 12

Verification Results – Setup • Single instruction checks • • • FPU vs high

Verification Results – Setup • Single instruction checks • • • FPU vs high level reference model 45 instructions require case-splits 24 instructions covered by semi-formal 410 instructions fully covered Model: 10 k variables/ 100 k latches/ 3352 k ANDs • Instruction sequence checks • FPU (sequence) vs FPU (with single followed op) • Different types of instruction: • Pipelined • Fixed latency multicycle • Variable latency multicycle • 9 scenarios of sequences types defined • Two models: • B 2 B issue only • Infinite sequences • Model: 7, 6 k variables/ 254 k latches/ 1398 k ANDs 13

Results- Single Instruction Runtime Memory 64 bit Binary-FP ADD overlap-case (369) 3 min: 50

Results- Single Instruction Runtime Memory 64 bit Binary-FP ADD overlap-case (369) 3 min: 50 s 1. 5 GB 64 bit Binary-FP ADD cancellation-case (168) 7 min: 51 s 1. 5 GB 128 bit Decimal-FP ADD overlap-case (26388) 4 min: 28 s 1. 5 GB 128 bit Decimal-FP shift single test 17 min: 04 s 1. 5 GB 128 bit Hex-FP convert to 64 bit Integer single test 18 min: 15 s 1. 3 GB 64 bit Binary-FP divide semi formal only >24 h running on Linux. TM 2. 6 64 bit, Xeon. TM E 5 -2680 2. 7 GHz 14

Results – Sequences Followed instruction Irritator instruction Runtime Memory Pipelined (extract exponent) Pipelined (convert

Results – Sequences Followed instruction Irritator instruction Runtime Memory Pipelined (extract exponent) Pipelined (convert decimal integer to decimal fp) 1 min: 07 s 1 GB Fixed latency (128 bit decimal fp add) 1 min: 14 s 0. 94 GB Variable latency (convert binary fp to decimal fp) 21 min: 17 s 1. 1 GB Pipelined (convert decimal integer to decimal fp) 1 min: 52 s 1. 1 GB Fixed latency (128 bit decimal fp add) 1 min: 22 s 1 GB Variable latency (convert binary fp to decimal fp) 1: 13 min: 22 s 3. 6 GB Pipelined (convert decimal integer decimal fp) 13 min: 29 s 1. 3 GB Fixed latency (128 bit decimal fp add) 24 min: 37 s 1. 8 GB Variable latency (convert binary fp to decimal fp) 6 h: 6 min: 17 s 7 GB Fixed latency (compare decimal fp) 15

Conclusions and Future work • Presented an end-to-end automated approach to verify FPUs •

Conclusions and Future work • Presented an end-to-end automated approach to verify FPUs • Inclusive of dataflow and control • Dataflow verified instruction-by-instruction against reference • Control verified via a sequential equivalence check • Future Work – Extend B 2 B sequences to random sequences – cover all possible sequences • Random sequences with pipelined instructions solvable • Random sequences with multicycle instructions unsolved in 24 h – Include forwarding of operands • Internal equivalences do not hold due to latency differences 16

Related Work • Intel. TM uses combination of automatic methods and STE • •

Related Work • Intel. TM uses combination of automatic methods and STE • • Published in CAV 2009 and FMCAD 2012 Results depict most defects attributed to STE • Likely requires manual-implementation specific effort • • Full details for reproducibility not disclosed Most other works focus on data path verification • Focus on specific instructions and design artifacts • E. g. FMA instruction together with multiplier • Largely manual as rely on methods such as theorem proving • Tedious proofs which are implementation specific • If automatic use special purpose data structures • E. g. Chen’ 98 uses PHDDs vs SAT/BDDs 17