Fault Diagnosis Overview David Lavo UC Santa Cruz

  • Slides: 54
Download presentation
Fault Diagnosis Overview David Lavo UC Santa Cruz January 13, 2005 © 2005 David

Fault Diagnosis Overview David Lavo UC Santa Cruz January 13, 2005 © 2005 David Lavo Fault Diagnosis Overview

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? • Research: Why does (or doesn’t) it work? How should it work? © 2005 David Lavo Fault Diagnosis Overview 2

What is Fault Diagnosis? • A guess as to what’s wrong with a malfunctioning

What is Fault Diagnosis? • A guess as to what’s wrong with a malfunctioning circuit • Narrows the search for physical root cause • Makes inferences based on observed behavior • Usually based on the logical operation of the circuit © 2005 David Lavo Fault Diagnosis Overview 3

VLSI Fault Diagnosis (in One Slide) Defective Circuit Observed Behavior Tests Location or Fault

VLSI Fault Diagnosis (in One Slide) Defective Circuit Observed Behavior Tests Location or Fault Physical Analysis © 2005 David Lavo Diagnosis Fault Diagnosis Overview Diagnosis Algorithm 4

Two Types of Diagnosis • Circuit Partitioning (“Effect-Cause” Diagnosis) – Identify fault-free or possibly-faulty

Two Types of Diagnosis • Circuit Partitioning (“Effect-Cause” Diagnosis) – Identify fault-free or possibly-faulty portions – Identify suspect components, logic blocks, interconnects • Model-Based Diagnosis (“Cause-Effect” Diagnosis) – Assume one or more specific fault models – Compare behavior to fault simulations © 2005 David Lavo Fault Diagnosis Overview 5

Circuit Partitioning • Separate known-good portions of circuit from likely areas of failure •

Circuit Partitioning • Separate known-good portions of circuit from likely areas of failure • Simplest method: identify failing flip-flops – Tester can identify failing flops or outputs – Input cone of logic is suspect – Intersection of multiple cones is highly suspect – Single clock pulse with scan be used for sequential/functional fails © 2005 David Lavo Fault Diagnosis Overview 6

Back-Tracing Failures © 2005 David Lavo Fault Diagnosis Overview 7

Back-Tracing Failures © 2005 David Lavo Fault Diagnosis Overview 7

aka Effect-Cause Diagnosis • Reasoning based on observed behavior and expected (good-circuit) functions •

aka Effect-Cause Diagnosis • Reasoning based on observed behavior and expected (good-circuit) functions • Commonly used at system and board-levels • Tries to separate good and suspect areas • Advantage: Simple and general • Disadvantage: Not very precise, often gives no indication of defect mechanism © 2005 David Lavo Fault Diagnosis Overview 8

Cause-Effect Diagnosis • Start from possible causes (fault models), compare to observed effects •

Cause-Effect Diagnosis • Start from possible causes (fault models), compare to observed effects • A simulator is used to predict behavior of the circuit in the presence of various faults • Match prediction(s) against observed behavior • Advantage: Implicates a mechanism as well as a location • Disadvantage: Can be fooled by unmodeled defects © 2005 David Lavo Fault Diagnosis Overview 9

Cause-Effect Diagnosis Behavior Signature 0100010101010 … Defective Circuit Tests Comparison & Conclusion 01010011000010100 …

Cause-Effect Diagnosis Behavior Signature 0100010101010 … Defective Circuit Tests Comparison & Conclusion 01010011000010100 … Diagnosis Algorithm 1010000101100 … 010100011101100 … Fault Simulator © 2005 David Lavo 000111000101010011110 … Candidate Signatures Fault Diagnosis Overview 10

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? • Research: Why does (or doesn’t) it work? How should it work? © 2005 David Lavo Fault Diagnosis Overview 11

Components of Fault Diagnosis • • Fault models Fault simulators Fault dictionaries Diagnosis algorithms

Components of Fault Diagnosis • • Fault models Fault simulators Fault dictionaries Diagnosis algorithms © 2005 David Lavo Fault Diagnosis Overview 12

Fault Models • A fault model is an abstraction of a type of defect

Fault Models • A fault model is an abstraction of a type of defect behavior • A fault instance is the application of a model to a circuit wire, node, gate, etc. • Used to create and evaluate test sets • For diagnosis, they can be used to simulate and predict faulty behaviors © 2005 David Lavo Fault Diagnosis Overview 13

Stuck-at Fault Model • The most-used fault model (by far) • Simple to simulate

Stuck-at Fault Model • The most-used fault model (by far) • Simple to simulate and enumerate • Effective for testing, fault grading, and diagnosis of some defects • Many defects are not well represented by the stuck-at model © 2005 David Lavo Fault Diagnosis Overview Node A stuck-at 1: 0/1 A 0/1 1 B (Fault-free/faulty logic values) 14

Bridging Fault Model • Shorts are a common defect type in CMOS • Different

Bridging Fault Model • Shorts are a common defect type in CMOS • Different bridging fault models have varying accuracy and precision, from simplistic to very sophisticated • Difficult or impractical to enumerate © 2005 David Lavo Nodes X and Y bridged: 0 Fault Diagnosis Overview 1 1 1 X 0 Y 1/0 Node X forces Y to a value of 0 15

Some Diagnostic Fault Models Gate Fault Net Fault Bridging Fault © 2005 David Lavo

Some Diagnostic Fault Models Gate Fault Net Fault Bridging Fault © 2005 David Lavo Path Fault Diagnosis Overview 16

Fault Simulators • A fault simulator can simulate instances of a particular fault model

Fault Simulators • A fault simulator can simulate instances of a particular fault model • Inputs: – Circuit (netlist) – Test set – Faultlist (list of fault instances) • Output: circuit response • Usually, simulates the presence of a single fault instance (“single-fault assumption”) © 2005 David Lavo Fault Diagnosis Overview 17

Fault Dictionaries • A fault dictionary is a database of the simulated responses for

Fault Dictionaries • A fault dictionary is a database of the simulated responses for all faults in faultlist • Used by some diagnosis algorithms for convenience: – Fast: no simulation at time of diagnosis – Self-contained: netlist, simulator, and test set not needed after dictionary creation • Can be very large, however! © 2005 David Lavo Fault Diagnosis Overview 18

The Full-Response Dictionary • For each fault ( f ), store the response to

The Full-Response Dictionary • For each fault ( f ), store the response to each test vector ( v ) • One bit per vector, pass ( 0 ) or fail ( 1 ) • For each vector, store the expected output response ( o ) • Total storage requirement: f v o bits © 2005 David Lavo Fault Diagnosis Overview 19

The Pass-Fail Dictionary • For each fault, store only the test vector responses •

The Pass-Fail Dictionary • For each fault, store only the test vector responses • One bit per vector, pass ( 0 ) or fail ( 1 ) • Total storage requirement: f v bits • Much smaller than full-response, and often practical for even very large circuits © 2005 David Lavo Fault Diagnosis Overview 20

Dynamic Diagnosis • Alternative to dictionary-based diagnosis • Fault simulation is only done for

Dynamic Diagnosis • Alternative to dictionary-based diagnosis • Fault simulation is only done for certain faults, based on test results – Only simulate faults in input cones of failing flip-flops/outputs • Dictionary is eliminated, but requires complete netlist and test pattern file • Used by most commercial ATPG tools: Mentor Fastscan, Synopsys, Cadence, etc. © 2005 David Lavo Fault Diagnosis Overview 21

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? • Research: Why does (or doesn’t) it work? How should it work? © 2005 David Lavo Fault Diagnosis Overview 22

Algorithm Details • Role of a diagnosis algorithm • Scoring methods • Types of

Algorithm Details • Role of a diagnosis algorithm • Scoring methods • Types of diagnosis algorithms © 2005 David Lavo Fault Diagnosis Overview 23

Diagnosis Algorithms • Algorithms compare observed behavior to predicted behaviors • An algorithm attempts

Diagnosis Algorithms • Algorithms compare observed behavior to predicted behaviors • An algorithm attempts to “explain” the observed failures with fault candidates • The job of a diagnosis algorithm is to report the best fault candidate(s) • “Best” is determined by scoring method © 2005 David Lavo Fault Diagnosis Overview 24

Fault Candidate Scoring • Two common scoring methods – Match/mismatch points – Fault candidate

Fault Candidate Scoring • Two common scoring methods – Match/mismatch points – Fault candidate probability • Other common scorings: – Hamming distance – Set intersection/overlap – Nearest neighbor © 2005 David Lavo Fault Diagnosis Overview 25

Match/mismatch Point Scoring • Award points for matching observed failures • Optionally deduct points

Match/mismatch Point Scoring • Award points for matching observed failures • Optionally deduct points for not predicting fails • Nonprediction: A behavior not predicted by candidate • Misprediction: A prediction not fulfilled by behavior • Commercial tools (e. g. Fastscan) are usually biased to lowest nonprediction © 2005 David Lavo Fault Diagnosis Overview 26

Probabilistic Scoring • Probability score based on matches and mismatches and error assumptions –

Probabilistic Scoring • Probability score based on matches and mismatches and error assumptions – Weights for non- and mis-prediction – Different prediction probabilities for different fault candidates (bridges vs. stuck-at) • Usually normalized so that total of all candidates equals 1. 0 • UCSC method uses probabilities to compare stuck-at candidates to bridges in same diagnosis © 2005 David Lavo Fault Diagnosis Overview 27

Types of Diagnosis Algorithms • Stuck-at – Most common, best supported by tools –

Types of Diagnosis Algorithms • Stuck-at – Most common, best supported by tools – Surprisingly effective (~60% exact matches) – Very fast • IDDQ – Orthogonal set of failing data – Requires interpretation of tester results – Not well supported by tools © 2005 David Lavo Fault Diagnosis Overview 28

IDDQ Threshold Setting © 2005 David Lavo Fault Diagnosis Overview 29

IDDQ Threshold Setting © 2005 David Lavo Fault Diagnosis Overview 29

Types of Diagnosis Algorithms (Cont) • Bridging-fault – May better represent common CMOS faults

Types of Diagnosis Algorithms (Cont) • Bridging-fault – May better represent common CMOS faults – More complicated fault model – Biggest problem: candidate selection • Other possible (future) directions: – Functional fails – Delay fails – Parametric failures © 2005 David Lavo Fault Diagnosis Overview 30

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? • Research: Why does (or doesn’t) it work? How should it work? © 2005 David Lavo Fault Diagnosis Overview 31

Diagnosis in Practice • • Using a diagnosis Translating the results: circuit navigation Evaluating

Diagnosis in Practice • • Using a diagnosis Translating the results: circuit navigation Evaluating diagnosis quality Commercial diagnosis tools © 2005 David Lavo Fault Diagnosis Overview 32

Using a Diagnosis • Fault diagnosis is used to aid physical inspection and root-cause

Using a Diagnosis • Fault diagnosis is used to aid physical inspection and root-cause identification • Diagnosis output is logical, not physical: – Abstract faults (such as stuck-at) – Gates, ports (nodes), and nets – No information about location or size • Translation to physical location requires navigation of circuit © 2005 David Lavo Fault Diagnosis Overview 33

Types of Circuit Navigation • Netlist – Examine RTL (Verilog/VHDL etc) for gates and

Types of Circuit Navigation • Netlist – Examine RTL (Verilog/VHDL etc) for gates and data paths • Schematic – Symbolic view of gates and wires • Layout/artwork – Graphical view of metal lines, poly, vias, cell boundaries, etc. © 2005 David Lavo Fault Diagnosis Overview 34

Circuit Netlist module TOP (CLK, Reset, Start. Out, Si. Ready, Rst_Cnt. N, Up_Dn. N,

Circuit Netlist module TOP (CLK, Reset, Start. Out, Si. Ready, Rst_Cnt. N, Up_Dn. N, Wr, SDin, Wr_RAM, Wr_Rreg, RAM_Addr, ATG_TESTMODE, BIST_TESTMODE, SDout, Two. Ones, One, No. Ones, Two. Zeros, One. Zero, No. Zeros); input inout CLK; Reset, Start. Out, Si. Ready, Rst_Cnt. N, Up_Dn. N, Wr, SDin, Wr_RAM; inout [2: 0] RAM_Addr; inout ATG_TESTMODE; inout BIST_TESTMODE; inout SDout, One. Zero, No. Zeros; inout Two. Ones, One, No. Ones, Two. Zeros, Wr_Rreg; // Tie off cells TLOW tielow 1 (. Q(tielow)); THIGH tiehigh 1 (. Q(tiehigh)); // Inverted CLK wire CLK_N; INVFF clkinv (. Q(CLK_N), . A(CLK)); //PADS PADNMIOSCM 0 H 08 N 05 B 50 PAD 001_Start. Out (. PUEN(tiehigh), . PDE(tielow), . IEN(tielow), . I(Start. Out_I), . SIGNAME(Start. Out), . INMODE(in_mode_avail), . TESTI(jumper 001), . TESTIEN(tiehigh), . SCANIN(jumper 001), . OUTMODE(out_mode_avail), . TESTO(tiehigh), . TESTOEN(tiehigh), . O(tielow), . OEN(tiehigh)); © 2005 David Lavo Fault Diagnosis Overview 35

Netlist Navigation • Either use text editor on netlist, or use browser function in

Netlist Navigation • Either use text editor on netlist, or use browser function in simulator • Browsers allow you to trace forward and backward and see logic values • Can be used to view hierarchy and functional blocks • Can be tedious © 2005 David Lavo Fault Diagnosis Overview 36

Circuit Schematic © 2005 David Lavo Fault Diagnosis Overview 37

Circuit Schematic © 2005 David Lavo Fault Diagnosis Overview 37

Schematic Navigation • Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires

Schematic Navigation • Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires • Schematic tools in simulators also allow forward and backward traversal and display of logic values • Used to verify fault propagation • Does not reflect physical distances © 2005 David Lavo Fault Diagnosis Overview 38

Circuit Artwork © 2005 David Lavo Fault Diagnosis Overview 39

Circuit Artwork © 2005 David Lavo Fault Diagnosis Overview 39

Layout (Artwork) Navigation • Use routing/floorplanning tools to view artwork • Can usually input

Layout (Artwork) Navigation • Use routing/floorplanning tools to view artwork • Can usually input cell or wire name and tool will highlight the object • Useful for determining (x, y) values • Also good for evaluating physical implications of a set of fault candidates – Faults clustered in a small area are good – Faults/nets spread around large die areas are bad © 2005 David Lavo Fault Diagnosis Overview 40

Fault Proximity Net runs across die: physical examination is almost impossible Faults contained in

Fault Proximity Net runs across die: physical examination is almost impossible Faults contained in small area: physical examination is possible © 2005 David Lavo Fault Diagnosis Overview 41

Evaluating a Diagnosis • A diagnosis without one or a few strong (highscoring) candidates

Evaluating a Diagnosis • A diagnosis without one or a few strong (highscoring) candidates is usually poor • Can indicate: – Multiple defects – Unmodeled (complex) behavior – Inappropriate algorithm • If the diagnosis is poor, either try another algorithm or look for more data (failures) © 2005 David Lavo Fault Diagnosis Overview 42

Evaluating a Diagnosis (cont) • Many diagnoses (~60%) implicate a single stuck-at fault •

Evaluating a Diagnosis (cont) • Many diagnoses (~60%) implicate a single stuck-at fault • Usually a good sign, but you must consider equivalent faults • Many defects can mimic a stuck-at fault, without being a short to Vdd or Gnd • Consider nearby nodes also, if practical © 2005 David Lavo Fault Diagnosis Overview 43

Dominance Bridging Fault Strong inverter FIB short Weak inverter Top candidate is stuck-at fault

Dominance Bridging Fault Strong inverter FIB short Weak inverter Top candidate is stuck-at fault on this node. © 2005 David Lavo Fault Diagnosis Overview 44

Candidate #2 is Best Candidate #1 Candidate #2 Candidate #3 FIB short © 2005

Candidate #2 is Best Candidate #1 Candidate #2 Candidate #3 FIB short © 2005 David Lavo Fault Diagnosis Overview 45

Commercial Tool: Mentor Graphics • • ATPG tool: Fastscan Stuck-at diagnosis only No IDDQ

Commercial Tool: Mentor Graphics • • ATPG tool: Fastscan Stuck-at diagnosis only No IDDQ capability Orders candidates by number of matched failures (biased to lowest non-prediction) • Also has netlist & schematic browser • Based on Waicukauski & Lindbloom (D&T‘ 89) © 2005 David Lavo Fault Diagnosis Overview 46

Commercial Tool: Synopsys • ATPG tool: Tetra. MAX • J. Waicukauski moved to Synopsys

Commercial Tool: Synopsys • ATPG tool: Tetra. MAX • J. Waicukauski moved to Synopsys after writing Fastscan • Diagnosis capability unknown: assumed to be similar to Fastscan © 2005 David Lavo Fault Diagnosis Overview 47

Commercial Tool: Cadence • ATGP tool: Encounter Test • Test and diagnosis tools purchased

Commercial Tool: Cadence • ATGP tool: Encounter Test • Test and diagnosis tools purchased from IBM • IBM has had good diagnosis research, but Encounter’s capabilities are unknown • Also of interest: Silicon Ensemble - routing tool • Graphical artwork viewer • Good for highlighting nets and cells based on diagnosis results • Good for determining (x, y) and producing screen shots © 2005 David Lavo Fault Diagnosis Overview 48

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How

Outline • • Introduction: What is Fault Diagnosis? Components: What’s involved? Algorithm details: How does it work? Diagnosis in practice: How does it really work? • Research: Why does (or doesn’t) it work? How should it work? © 2005 David Lavo Fault Diagnosis Overview 49

Prior Art • Waicukauski & Lindbloom, IEEE Design & Test, Aug. ‘ 89 –

Prior Art • Waicukauski & Lindbloom, IEEE Design & Test, Aug. ‘ 89 – Most widely-used algorithm for commercial tools – Finds candidates to match individual tests, attempts to “explain” all failing tests • Abramovici & Breuer, IEEE Trans. Computing, June ‘ 80 – Effect-cause diagnosis – Permanent stuck-at fault assumption • Aitken & Maxwell, HP Journal, Feb. ’ 95 – Analysis of relative importance of models vs. algorithms • Lavo, Larrabee, et. Al. , Proceedings of ITC ’ 98 – Probabilistic scoring – Mixed-model diagnosis • Bartenstein et. Al. , Proceedings of ITC ’ 01 – SLAT: Single Location At-a-Time diagnosis – Focus on matching per-vector results © 2005 David Lavo Fault Diagnosis Overview 50

Prior Art (cont) • Jee & Ferguson, Proceedings of ISTFA ’ 93 – Carafe

Prior Art (cont) • Jee & Ferguson, Proceedings of ISTFA ’ 93 – Carafe – Inductive Fault Analysis (IFA) – Examine circuit to determine likely failure locations • Aitken, Proceedings of ITC ’ 95 – Using FIBs to insert defects – Calibrate/evaluate diagnosis methods • Henderson & Soden, Proceedings of ITC ’ 97 – Probabilistic physical failure analysis • Nigh, Vallett, et. Al. , Proceedings of ITC ’ 98 – Large-scale, multi-company SEMATECH experiment – Failure analysis of timing and IDDQ fails © 2005 David Lavo Fault Diagnosis Overview 51

Research Directions • Complex defect behaviors – Beyond stuck-at and 2 -line bridges –

Research Directions • Complex defect behaviors – Beyond stuck-at and 2 -line bridges – Intermittent faults – Delay and timing-related defects – Parametric & process-related defects – Multiple simultaneous defects – Is there a simple, inductive way to infer complex defects? © 2005 David Lavo Fault Diagnosis Overview 52

Research Directions (cont) • Diagnosibility – What makes a particular circuit easy or hard

Research Directions (cont) • Diagnosibility – What makes a particular circuit easy or hard to diagnose? – What can we do to make diagnosis easier? • Evaluation of diagnoses – What makes a good diagnosis? – Can we quantify our confidence in a diagnosis? © 2005 David Lavo Fault Diagnosis Overview 53

Research Directions (cont) • Integration with physical FA & yield improvement – Can we

Research Directions (cont) • Integration with physical FA & yield improvement – Can we incorporate process information? – Can we produce a “physical diagnosis”? – On-line (or even on-chip) diagnosis • Commercial toolflow integration – Can diagnosis tools use industry-standard data formats? – Can commercial tools be scripted or programmed to do better diagnosis? © 2005 David Lavo Fault Diagnosis Overview 54