CSE 246 Computer Arithmetic Algorithms and Hardware Design

  • Slides: 56
Download presentation
CSE 246: Computer Arithmetic Algorithms and Hardware Design Lecture 4: Adders Instructor: Prof. Chung-Kuan

CSE 246: Computer Arithmetic Algorithms and Hardware Design Lecture 4: Adders Instructor: Prof. Chung-Kuan Cheng CSE 246

Topics: o Adders n n n CSE 246 AND/OR gate v. s. Circuit Logic

Topics: o Adders n n n CSE 246 AND/OR gate v. s. Circuit Logic Design Graph Design (Prefix Adder) 2

Chapter 2: ADDERS o Half Adders n n n Half adders can add two

Chapter 2: ADDERS o Half Adders n n n Half adders can add two 1 -bit binary numbers when there is no carry in. If the inputs are xi and yi, the sum and carry-out is given by the formula o si = x i ^ y i o ci+1 = xi. yi We use the following notations throughout the slides o. means logical AND o + means logical OR o ^ means logical XOR o ‘ means complementation CSE 246 3

Full Adder o o o The inputs are x[i], y[i] (operand bits) and c[i]

Full Adder o o o The inputs are x[i], y[i] (operand bits) and c[i] (carry in) The outputs are s[i] (result bit) and c[i+1] (carry out) Inputs and outputs are related by these relations n n CSE 246 s[i] = x[i] ^ y[i] ^ c[i] c[i+1] = x[i]. y[i] + c[i]. (x[i] + y[i]) = x[i]. y[i] + c[i]. (x[i] ^ y[i]) 4

Full Adder o o If carry-in bit is zero, then full adder becomes half

Full Adder o o If carry-in bit is zero, then full adder becomes half adder If carry-in bit is one, then n n o s[i] = (x[i] ^ y[i])’ c[i+1] = x[i] + y[i] To add two n-bit numbers, we can chain n full adders to build a ripple carry adder CSE 246 5

Ripple Carry Adder x[n-1] y[n-1] x[0] y[0] cin/c[0] y[1] c[n-1] . . . c[1]

Ripple Carry Adder x[n-1] y[n-1] x[0] y[0] cin/c[0] y[1] c[n-1] . . . c[1] c[2] cout s[n-1] s[0] Overflow happen when operands are of same sign, and the result is of different sign. If we use 2’s complement to represent negative numbers, overflow occurs when (cout ^ c[n-1]) is 1 CSE 246 6

Ripple Carry Adder o o o For sake of brevity, we use the following

Ripple Carry Adder o o o For sake of brevity, we use the following notations: n g[i] = x[i]. y[i] n p[i] = x[i] + y[i] In terms of these notations, we can rewrite carry equations as n c[1] = g[0] + p[0]. c[0] n c[2] = g[1] + p[1]. c[1] n and so on… n We shall use these notations afterwards while discussing the design of other kind of adders It has been observed that expected length of carry chain is 2, while expected maximal length of carry chain is lg n. Hence, ripple carry adders are in general fast. CSE 246 7

Ripple Carry Adder o How do know that an adder has completed the operation?

Ripple Carry Adder o How do know that an adder has completed the operation? n Worst case scenario: Wait for the longest chain in the carry propagation network n We might inspect c[i+1] and its complement b[i+1] to determine the status of the adder CSE 246 c[i+1] b[i+1] Remark 0 0 Not complete 1 0 Complete 0 1 Complete 1 1 Don’t care 8

Improvement to Ripple Carry Adder: Manchester Adders o o o By intelligently using our

Improvement to Ripple Carry Adder: Manchester Adders o o o By intelligently using our device properties, we can reduce the complexity of the circuit used to compute carries in a ripple carry adder. Define: a[i] = (x[i])’. (y[i])’ Next we observe that c[i+1] is 1 in exactly these scenarios: n g[i] is 1, i. e. both x[i] & y[i] are 1 n c[i] is 1 and it is propagated because p[i] is 1 c[i+1] is ‘pulled down’ to logic 0 irrespective of the value of c[i], when a[i] is 1, i. e. both x[i] and y[i] are 0 From these conditions, and keeping in mind the general characteristics of transistor devices we can design simplified circuits for computing carries – as shown in the next slide CSE 246 9

Improvement to Ripple Carry Adder: Manchester Adders CSE 246 10

Improvement to Ripple Carry Adder: Manchester Adders CSE 246 10

Implementation of Manchester Adder using MOS transistors This is essentially the same circuit for

Implementation of Manchester Adder using MOS transistors This is essentially the same circuit for computing carry, but implemented with MOS devices CSE 246 11

Manchester Adder: Alternate design o o We divide the computation cycle into two distinct

Manchester Adder: Alternate design o o We divide the computation cycle into two distinct half-cycle : ‘precharge’ and ‘evaluate’. In the precharge halfcycle, g[i] and c[i+1] are assigned a tentative value of logic 1. This is evaluated in the next half-cycle with actual value of a[i]. The actual circuit for computing carries is shown in the next slide. CSE 246 12

Manchester Adder: Alternate design evaluation precharge Q Time CSE 246 13

Manchester Adder: Alternate design evaluation precharge Q Time CSE 246 13

Carry Look-ahead Adder o o o In a ripple-carry adder m-full adders are grouped

Carry Look-ahead Adder o o o In a ripple-carry adder m-full adders are grouped together (m is usually equal to 4). Once the carry-in to the group is known, all the internal carries and the output carry is calculated simultaneously. We can use some algebraic manipulations to minimize hardware complexity. Consider the carry out of the group n c[i] = g[i-1] + p[i-1]. c[i-1] n Putting the value of c[i-1], we can rewrite as c[i] = g[i-1] + p[i-1]. g[i-2] + p[i-1]. p[i-2]. c[i-2] n Proceeding in this manner we get c[i] = g[i-1] + p[i-1]. g[i-2] + p[i-1]. p[i-2]. g[i-3] + p[i-1]. p[i 2]. p[i-3]. g[i-4] + p[i-1]. p[i-2]. p[i-3]. p[i-4]. c[i-4] n To further simplify the equation, we note that g[i-1] = g[i-1]. p[i-1], and p[i-1] can be factored out CSE 246 14

Ling’s Adder c[i] = g[i-1] + p[i-1]. g[i-2] + p[i-1]. p[i-2]. g[i 3] +

Ling’s Adder c[i] = g[i-1] + p[i-1]. g[i-2] + p[i-1]. p[i-2]. g[i 3] + p[i-1]. p[i-2]. p[i-3]. g[i-4] + p[i-1]. p[i 2]. p[i-3]. p[i-4]. c[i-4] We replace p[i]=x[i]^y[i] with t[i]=x[i]+y[i]. Because g[i]=g[i]t[i], we have c[i] = g[i-1]t[i-1] + t[i-1]g[i-2] + t[i-1]. t[i 2]. g[i-3] + t[i-1]. t[i-2]. t[i-3]. g[i-4] + t[i 1]. t[i-2]. t[i-3]. t[i-4]. c[i-4] Let h[i] = g[i-1] + g[i-2] + t[i-2]. g[i-3] + t[i-2]. t[i -3]. g[i-4] + t[i-2]. t[i-3]. t[i-4]. t[i-5] h[i-4] C[i]= h[i]t[i-1] CSE 246 15

Ling’s Adder h[0]=c[0] h[3]=g[2]+g[1]+t[1]g[0]+t[1]t[0]h[0] s[3]=p[3]^c[3]=p[3]^(h[3]t[2]) =t[3]’h[3]t[2]+t[3](h[3]’+t[2]’) =h[3]’p[3]+h[3](p[3]^t[2]) h[6]=g[5]+g[4]+t[4]g[3]+t[4]t[3]t[2]h[3] s[6]=h[6]’p[6]+h[6]’(p[6]^t[5]) CSE 246 16

Ling’s Adder h[0]=c[0] h[3]=g[2]+g[1]+t[1]g[0]+t[1]t[0]h[0] s[3]=p[3]^c[3]=p[3]^(h[3]t[2]) =t[3]’h[3]t[2]+t[3](h[3]’+t[2]’) =h[3]’p[3]+h[3](p[3]^t[2]) h[6]=g[5]+g[4]+t[4]g[3]+t[4]t[3]t[2]h[3] s[6]=h[6]’p[6]+h[6]’(p[6]^t[5]) CSE 246 16

Generalized Design for Adders: Prefix Adder o Prefix computation Given n inputs x 1,

Generalized Design for Adders: Prefix Adder o Prefix computation Given n inputs x 1, x 2, x 3…xn and an associative operator ×. We want to compute yi = xi × xi-1 × xi-2 …× x 2 × x 1 for all i, 1≤ i ≤n n x can be a scalar/vector/matrix n For design of adders, we define the operator × in the following manner n o o o CSE 246 (g, p) = (g’, p’) × (g’’, p’’) g = g’’ + p’’. g’ p = p’. p’’ 17

Alternate modeling of Prefix Computer: Finite State Machine o A finite state machine has

Alternate modeling of Prefix Computer: Finite State Machine o A finite state machine has a set of states, and it ‘moves’ from one state to another according to input. Mathematically, n o o sk = f (sk-1, ak-1) The problem is to determine final state sn in O(lg n) operations, given initial state s 0 and sequence of inputs (a 0, a 1, …an-1) This problem can be formulated in terms of prefix computation CSE 246 18

Alternate modeling of Prefix Computer: Finite State Machine o o o We assume that

Alternate modeling of Prefix Computer: Finite State Machine o o o We assume that number of states are small and finite. Let sk = fak-1(sk-1), fak-1 can be represented by matrix Mak-1 Now we are ready to represent our problem in terms of prefix computation. CSE 246 19

Alternate Modeling of Prefix Computer: Finite State Machine The algorithm Compute Mai in parallel

Alternate Modeling of Prefix Computer: Finite State Machine The algorithm Compute Mai in parallel Compute o 1. 2. N 1 = M a 1 N 2 = Ma 2. Ma 1 … Nn = Man-1…Ma 1 n n 3. Compute Si+1= Ni(S 0) CSE 246 20

Prefix Computation o FSM example: n o o n 0/0 Given: initial state S

Prefix Computation o FSM example: n o o n 0/0 Given: initial state S 0=A A sequence of inputs: (0 0 1 1 1 0 1) 0/0 1/0 A 0/0 B Derive the sequence of outputs PSPS NS Next X=0 X=0 A A B B B C C B CSE 246 1/1 Input M 0 State table M 1 Sequence: PS State. NS X=1 X=1 AA A B BC C B CA A 1/0 Compute N’s: 0 N 1=M 0 0 N 2=M 0 1 N 3=M 1 M 0 1 N 4=M 1 M 0 … … 21 C PS A B C NS 12 B B B PS A B C NS 13 C C C PS A B C NS 14 A A A

Graph Based Approach o Consider the (g p) chain n break the long paths

Graph Based Approach o Consider the (g p) chain n break the long paths g 3 p 3 g 2 p 2 C 4 g 1 p 1 CSE 246 22

Graph Based Approach o Generating g 32 and p 32 g 3 p 3

Graph Based Approach o Generating g 32 and p 32 g 3 p 3 g 2 p 2 g 1 p 1 C 4 g 3 p 3 g 2 p 2 C 1 g 32 p 32 CSE 246 23

Graph Based Approach o Generating g 10 and p 10 g 3 p 3

Graph Based Approach o Generating g 10 and p 10 g 3 p 3 g 2 p 2 g 1 p 1 C 4 g 1 p 1 cin g 10 p 10 CSE 246 24

Graph Based Approach o Generating g 30 and p 30 g 3 p 3

Graph Based Approach o Generating g 30 and p 30 g 3 p 3 g 2 p 2 g 32 p 32 g 10 g 1 g 30 p 10 g 10 p 30 p 10 CSE 246 25 p 1 cin

Boolean Approach g 4 + p 4 ( g 3 + p 3 (

Boolean Approach g 4 + p 4 ( g 3 + p 3 ( g 2 + p 2 ( g 1 + p 1 ( g 0 + p 0 cin ) ) g 4 , p 4 g 3 , p 3 g 4+p 4 g 3 , p 4 p 3 g 2 , p 2 g 1 , p 1 g 2+p 2 g 1 , p 2 p 1 g 4+p 4 g 3+p 4 p 3(g 2+p 2 g 1) , p 4 p 3 p 2 p 1 g 0 , p 0 cin g 0 , p 0 cin g 4+p 4 g 3+p 4 p 3(g 2+p 2 g 1)+(p 4 p 3 p 2 p 1)g 0 , (p 4 p 3 p 2 p 1) p 0 cin CSE 246 26

Prefix Adder o Given: n n o o n inputs (gi, pi) An operation

Prefix Adder o Given: n n o o n inputs (gi, pi) An operation o Associativity n (A o B) o C = A o ( B o C) Compute: n yi= (gi, pi) o … o (g 1, p 1) ( 1 <= i <= n) a, i=1 o o o (g’’, p’’) o (g’, p’) = (g, p) g=g’’ + p’’g’ p=p’’p’ CSE 246 27 gi = aibi , otherwise 1, i=1 pi = ai xor bi , otherwise

Prefix Adder: Graph Representation ai b i o Example: Ripple Carry Adder (gi ,

Prefix Adder: Graph Representation ai b i o Example: Ripple Carry Adder (gi , pi) x xoy CSE 246 y x oy 28

Prefix Adders: Conditional Sum Adder 8 CSE 246 7 6 5 4 29 3

Prefix Adders: Conditional Sum Adder 8 CSE 246 7 6 5 4 29 3 2 1

Prefix Adders: Conditional Sum Adder 8 7 6 5 4 3 2 1 o

Prefix Adders: Conditional Sum Adder 8 7 6 5 4 3 2 1 o alphabetical tree: o o o Binary tree Edges do not cross For output yi, there is an alphabetical tree covering inputs (xi, xi-1, …, x 1) CSE 246 30

Prefix Adders: Conditional Sum Adder 8 7 6 5 4 3 2 1 The

Prefix Adders: Conditional Sum Adder 8 7 6 5 4 3 2 1 The nodes in this tree can be reduced to (g, p) o c = g+pc o o From input x 1, there is a tree covering all outputs (yi, yi-1, …, y 1) CSE 246 31

Prefix Adders: size and depth o Objective: n n o Ripple Carry Adder: n

Prefix Adders: size and depth o Objective: n n o Ripple Carry Adder: n n n o Minimize # of nodes, sc(n). Minimize depth, dc(n) sc(8) = 7 dc(8) = 7 total = 14 Conditional Sum Adder: n n n CSE 246 sc(8) = 12 dc(8) = 3 total = 15 32

Prefix Adder – Well-known and Well-developed? o Classic prefix networks: Sklansky, Kogge. Stone, Brent-Kung,

Prefix Adder – Well-known and Well-developed? o Classic prefix networks: Sklansky, Kogge. Stone, Brent-Kung, Ladner-Fischer, Han. Carlson, Knowles etc. CSE 246 33

Prefix Adders: Brent – Kung Adder 15 14 13 12 11 10 9 8

Prefix Adders: Brent – Kung Adder 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 nsc(16) = 26 ndc(16) = 6 n CSE 246 34 total = 32

Prefix Adder – New Respects, New Method o Realistic design considerations: Timing, Power and

Prefix Adder – New Respects, New Method o Realistic design considerations: Timing, Power and Area. Logic Levels Max Fanouts o Timing Max Wire Tracks Power Area Integer Linear Programming for prefix adder: n n n Logic effort timing model (gate cap. + wire cap. ) Activity-statistic power model Non-uniform signal arrival/required times CSE 246 35

Prefix Adder – Optimum Prefix adders o Uniform signal arrival/required times Sklansky Adder CSEFastest

Prefix Adder – Optimum Prefix adders o Uniform signal arrival/required times Sklansky Adder CSEFastest 246 depth-3 optimal prefix adder Kogge-Stone Adder 36 Fastest depth-4 optimal prefix adder

Prefix Adder – Optimum Prefix adders o Uniform signal arrival/required times CSE 246 37

Prefix Adder – Optimum Prefix adders o Uniform signal arrival/required times CSE 246 37

The Big Picture What is the minimum depth of zero-deficiency circuits for a given

The Big Picture What is the minimum depth of zero-deficiency circuits for a given width? CSE 246 38

Proof for Snir’s Theorem Given an arbitrary prefix graph of width n, we have

Proof for Snir’s Theorem Given an arbitrary prefix graph of width n, we have depth + size ≥ 2 n – 2 o Proof n n Consider the alphabetical tree rooted at the MSB output with all the input nodes being its leaves; The size of this tree is n-1 while its depth is d. M; At most d. M prefix outputs can be generated from this tree; At least one extra node is needed for the columns where the prefix results are not ready. Consequently size ≥ (n-1)+(n-(d. M + 1)) = 2 n -2 - d. M which is size + depth ≥ 2 n - 2 CSE 246 39

Definitions For a prefix circuit, define o Backbone n o Affiliated tree n o

Definitions For a prefix circuit, define o Backbone n o Affiliated tree n o The binary alphabetical tree generating MSB prefix output; rooted at the LSB input, with all the prefix outputs (except MSB output) as its tree nodes Ridge n the path from the LSB input to the MSB output. Backbone Affiliated Tree CSE 246 40

How to … ? o o o Look from the MSB output Since the

How to … ? o o o Look from the MSB output Since the circuit is of zero-deficiency, the ridge has exactly d nodes (excluding the first input node), one node per level. The idea: try to stretch the ridge as long as possible while maintaining zerodeficiency CSE 246 41

T-tree o Definition of Tk(k) tree CSE 246 42

T-tree o Definition of Tk(k) tree CSE 246 42

T-tree example – T 3(5) CSE 246 43

T-tree example – T 3(5) CSE 246 43

A-tree o Definition of Ak(t) tree CSE 246 44

A-tree o Definition of Ak(t) tree CSE 246 44

A-tree example – A 3(5) CSE 246 45

A-tree example – A 3(5) CSE 246 45

Compound of A tree and T-tree CSE 246 46

Compound of A tree and T-tree CSE 246 46

Example CSE 246 47

Example CSE 246 47

Proposed Prefix Circuit CSE 246 48

Proposed Prefix Circuit CSE 246 48

An Example: Z(d)|d=8 BK(32) 32 T 3(5) + A 3(5) 58 T 1(7) +

An Example: Z(d)|d=8 BK(32) 32 T 3(5) + A 3(5) 58 T 1(7) + A 1(7) 88 1 T 2(6) + A 2(6) 81 80 59 Width = 88 CSE 246 33 49

The width of Z(d) Circuit The width of Z(d) circuit is Nz(d) = F(d+3)

The width of Z(d) Circuit The width of Z(d) circuit is Nz(d) = F(d+3) – 1 (d≥ 1) Where F(i) are the Fibonacci numbers o Numerical Comparison o d LS LYD Z 3 7 7 7 D LS LYD 8 47 77 Z 88 D 13 4 5 6 7 9 10 11 12 143 232 376 609 14 383 15 517 16 575 17 1030 11 16 23 33 12 20 33 54 66 95 95 131 169 191 242 LS LYD Z D LS LYD Z 260 308 986 18 1535 1625 10945 446 576 843 1101 1596 2583 4180 6764 19 20 21 22 2055 3071 4104 6143 2139 3176 4202 6264 17710 28656 46367 75024 LYD : Design by S. Lakshmivarahan, C. M. Yang & S. K. Dhall, 1987 LS : Design by Lin & Shish, 1999 CSE 246 50

Comparison o o o 64 -bit case Based on logical effort method to include

Comparison o o o 64 -bit case Based on logical effort method to include fan-out effect and interconnect capacitance Five adders n n n CSE 246 Z 64: A 64 -bit Z(d) circuit derived from Z(d)|d=8 BK: Brent-Kung adder Sklansky KS: Kogge-Stone adder HC: Han-Carlson Adder 51

Results o o w is the weight for lateral interconnect capacitance; KS and HC

Results o o w is the weight for lateral interconnect capacitance; KS and HC have large w value to compensate for coupling effect Z 64 and BK adder have similar delay and area, but Z 64 could be more power efficient because it has less logic levels CSE 246 52

Carry Skip Adder a 11, 8 b 11, 8 c 12 p 11, 8

Carry Skip Adder a 11, 8 b 11, 8 c 12 p 11, 8 c 12 n. If A 2 a 7, 4 b 7, 4 c 8 0 1 1 c 4 A 1 p 7, 4 0 a 3, 0 b 3, 0 p 3, 0 0 x 1 c 8 c 4 p 3, 0=p 3 p 2 p 1 p 0 = 1, then x = cin CSE 246 53 A 0 cin

Carry Propagation Paths o o o o A 2 <- MUX <- cin A

Carry Propagation Paths o o o o A 2 <- MUX <- cin A 2 <- MUX <- A 1 A 2 <- MUX <- A 0 c 12 <- MUX <- A 2 c 12 <- MUX <- A 1 c 12 <- MUX <- A 0 c 12 <- MUX <- cin CSE 246 54

False Path o A 1 <- MUX <- A 0 <- cin is a

False Path o A 1 <- MUX <- A 0 <- cin is a false path n n CSE 246 If carry is from cin, then block must have p 3 p 2 p 1 p 0 = 1 Since p 3, 0 = 1, g 3, 0 must be 0 The carry is not generated from A 0 The carry needs not to propagate via A 0, it will go from the MUX 55

Label Algorithm o Problem: n n o Given a digraph, a set of false

Label Algorithm o Problem: n n o Given a digraph, a set of false paths Derive the longest path of the graph Algorithm: n n n CSE 246 Color the edges on each false path a label The length of the walk of the same labels are accumulated Otherwise, change to no label 56