Unfolding ECE 734 VLSI Arrays for Digital Signal

  • Slides: 11
Download presentation
Unfolding ECE 734 VLSI Arrays for Digital Signal Processing

Unfolding ECE 734 VLSI Arrays for Digital Signal Processing

Definitions • Unfolding is the process of unfolding a loop so that several iterations

Definitions • Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration. • Also known as (a. k. a. ) – Loop unrolling (in compilers for parallel programs) – Block processing (C) 1997 -2006 by Yu Hen Hu • Applications – Reducing sampling period to achieve iteration bound (desired throughput rate) T. – Parallel (block processing) to execute several iterations concurrently. – Digit-serial or bit-serial processing ECE 734 VLSI Arrays for Digital Signal Processing 2

An example • Before unfolding: For n = 0 to N-1, y(n)=a*y(n-9)+x(n) end •

An example • Before unfolding: For n = 0 to N-1, y(n)=a*y(n-9)+x(n) end • Unfolding once (J = 2) For k = 0 to N/2 -1, y(2 k)=a*y(2 k-9)+x(2 k) y(2 k+1)=a*y(2 k-8)+x(2 k+1) end • Unfolding twice (J = 3) For k = 0 to N/3 -1, y(3 k)=a*y(3 k-9)+x(3 k) y(3 k+1)=a*y(3 k-8)+x(3 k+1) y(3 k+2)=a*y(3 k-7)+x(3 k+2) end (C) 1997 -2006 by Yu Hen Hu • Block processing formulation • J = 3, 9/J = 3 (an integer) – X(k) = [x(3 k) x(3 k+1) x(3 k+2)]T – Y(k) = [y(3 k) y(3 k+1) y(3 k+2)]T – Y(k) = a*Y(k- 3 ) + X(k) • J = 2, 9/J = 5 (not an integer) – X(k) = [x(2 k) x(2 k+1)]T – Y(k) = [y(2 k) y(2 k+1)]T – Y(k) = a*Y(k- 5 ) + X(k) ECE 734 VLSI Arrays for Digital Signal Processing 3

Implementation with J=3 3 Ts Ts X D + X D parallel-to-Serial conversion (C)

Implementation with J=3 3 Ts Ts X D + X D parallel-to-Serial conversion (C) 1997 -2006 by Yu Hen Hu Serial-to-parallel conversion x(0) x(1) x(2) x(3) x(4) x(5). . . + ECE 734 VLSI Arrays for Digital Signal Processing y(0) y(1) y(2) y(3) y(4) y(5). . . Ts 4

Unfolding the DFG • Rewrite the algorithm formulation: y(2 k)=a*y(2 k-9)+x(2 k) y(2 k+1)=a*y(2

Unfolding the DFG • Rewrite the algorithm formulation: y(2 k)=a*y(2 k-9)+x(2 k) y(2 k+1)=a*y(2 k-8)+x(2 k+1) T=Ts y(2 k)=a*y(2(k-5)+1)+x(2 k) y(2 k+1)=a*y(2(k-4))+x(2 k+1) • After J-folded unfolding, the clock period T = J Ts, where Ts is the data sampling period. (C) 1997 -2006 by Yu Hen Hu ECE 734 VLSI Arrays for Digital Signal Processing T=J Ts 5

Timing Diagram y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11)

Timing Diagram y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11) y(12) y(13) 9 T T=Ts 9 T T=2 Ts y(0) y(2) y(4) y(6) y(8) y(10) y(12) y(7) y(9) y(11) y(13) 4 T y(1) • y(3) Above timing diagram is obtained assuming that the sampling period Ts remains unchanged. Thus, the clock period T is increased J-fold. (C) 1997 -2006 by Yu Hen Hu y(5) 5 T • Since 9/2 is not an integer, output (y(0), y(1)) will be needed by two different future iterations, 4 T and 5 T later. ECE 734 VLSI Arrays for Digital Signal Processing 6

General DFG Unfolding Method • Define • Step 1. For each node U in

General DFG Unfolding Method • Define • Step 1. For each node U in original DFG, draw J nodes {Ui; 0 i J-1} in the unfolded DFG • Step 2. For each edge from U to V with w delays, draw J edges from Ui to V(i+w)%J with (i+w)/J delays (C) 1997 -2006 by Yu Hen Hu ECE 734 VLSI Arrays for Digital Signal Processing 7

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0 3 1 1 1 0 1 2 1 1 1 3 0 Q 0 0 0 2 2 S 0 T 0 S Q R 0 T 2 D 3 D S 1 R Q 1 T =3 R 1 Step 1. Duplicate J copies of each node (C) 1997 -2006 by Yu Hen Hu ECE 734 VLSI Arrays for Digital Signal Processing 8

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0 3 1 1 1 0 1 2 1 1 1 3 0 Q 0 0 0 2 2 S 0 T 0 S Q R 0 T 2 D 3 D S 1 R Q 1 T =3 R 1 Step 2. Add all edges with 0 delay on them. (C) 1997 -2006 by Yu Hen Hu ECE 734 VLSI Arrays for Digital Signal Processing 9

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0

Another DFG Unfolding Example i w (i+w)%J 0 0 0 J=2 0 1 0 3 1 1 1 0 1 2 1 1 1 3 0 Q 0 0 0 2 2 S 0 S Q D R 0 T 2 D T 0 D 3 D 2 D S 1 R Q 1 T =3 T 1 D R 1 Step 3. Use table on the left to figure out edges with delays. (C) 1997 -2006 by Yu Hen Hu ECE 734 VLSI Arrays for Digital Signal Processing T =6 10

Properties of Unfolding • • Unfolding preserves the number of registers (delays) in a

Properties of Unfolding • • Unfolding preserves the number of registers (delays) in a DFG For a loop with w delays in a DFG that has been unfolded J times, it leads to – g. c. d. (w, J) loops in the unfolded DFG, with each of these loops containing – w/(g. c. d. (w, J)) delays and – J/(g. c. d. (w, J)) copies of each node that appear in the original loop. • Unfolding a DFG with iteration bound T results in a J-folded DFG with iteration bound JT. (C) 1997 -2006 by Yu Hen Hu • • • A path with w (< J) delays in a DFG will lead to J-w paths with no delays, and w paths with 1 delay each in the J-unfolded DFG. Any path in the original DFG containing J or more delays leads to J paths with 1 or more delay in each path. Therefore, it can not create a critical path in the J-unfolded DFG Any clock period that can be achieved by retiming a Junfolded DFG can be achieved by retiming the original DFG and followed by J-unfolding. ECE 734 VLSI Arrays for Digital Signal Processing 11