DSP Architecture Design Crash Course 2020 Wen Tsung

Overview ▪ Introduction ▪ Pipeline ▪ Parallel ▪ Retiming ▪ Unfolding/Folding ▪ Systolic array

Performance Metrics of DSP Systems ▪ Hardware circuitry and resources (area) ▪ Speed of

Resource v. s. sample rate Media IC & System Lab 5

Data-Flow Graph (DFG) Media IC & System Lab 7

Dependence Graph(DG) Media IC & System Lab 8

Pipeline(1) ▪ Pipelining is the most important design techniques in VLSI DSP systems. ▪

Pipeline(2) 1. Identify critical path. 2. Place registers along a feed forward cutset. 3.

Pipeline(3) Disadvantages: 1. Latency increases. 2. Area cost can be huge for 2 D

Pipeline(4) 2 8 D Balanced Pipelining: Cut critical path in half. 5 5 D

Parallel(3) Whole system: Media IC & System Lab 18

Parallel(5) Combine with pipelining: Media IC & System Lab 20

Lowering Supply Voltage Pipelining: Media IC & System Lab Parallel: 21

Retiming(1) Adjusting existing registers. ▪ Reducing the clock period ▪ Reducing the number of

Retiming(2) Cutset retiming: Media IC & System Lab 25

Retiming(3) Single Cutset: D D D Media IC & System Lab 26

Retiming(4) Target: 5. 5 ns a * b D D out D c Media

Retiming(5) a * b D D out D c 5. 89 ns 7. 44

Retiming(6) a b c D 5. 48 ns D 5. 50 ns out 5.

Retiming(7) a * b D D out D c 5. 92 ns 7. 47

Retiming(8) a D b D c D * D D out D Modifying RTL:

Retiming(8) a D b D c 0. 48 ns D * D D out

Retiming(9) a b c D D 4. 39 ns D 4. 50 ns out

Folding(1) ▪ Trade area for timing. ▪ Scheduling is required. Media IC & System

Systolic Array(1) ▪ A network of processing elements (PEs) that rhythmically compute and pass

Systolic Array(2) 1. Design DG 2. Mapping to DFG 3. VLSI Array design Media

Systolic Array(3) 1. Design DG Media IC & System Lab 42

Systolic Array(4) Media IC & System Lab 43

Systolic Array(5) 3. Hardware architecture Media IC & System Lab 44

Systolic Array(6) Media IC & System Lab 45

Scheduling & Resource Allocation(1) Scheduling: when to do the process(node)? Resource Allocation: who to

Scheduling & Resource Allocation(2) Scheduling: when to do the process(node)? Resource Allocation: who to

Scheduling & Resource Allocation(3) Single Interval Formulation: Need 5 multipliers and 2 adders simultaneously

Scheduling & Resource Allocation(4) Rescheduling: Processor bound: * + Need 3 multipliers and 1

Scheduling & Resource Allocation(5) Considering two cycles: Need 2 multipliers and 1 adders simultaneously

Scheduling & Resource Allocation(6) ASAP and ALAP scheduling Media IC & System Lab 52

Scheduling & Resource Allocation(7) Earliest deadline and Slack time scheduling Media IC & System

Scheduling & Resource Allocation(8) Clique partition Media IC & System Lab 54

Scheduling & Resource Allocation(9) Left-edge algorithm Sorting left edge Media IC & System Lab

Scheduling & Resource Allocation(10) Left-edge algorithm Media IC & System Lab 56

Slides: 58

Download presentation

DSP Architecture Design Crash Course 2020 Wen Tsung Hsieh 7/23

Overview ▪ Introduction ▪ Pipeline ▪ Parallel ▪ Retiming ▪ Unfolding/Folding ▪ Systolic array ▪ Scheduling Media IC & System Lab 5 min 7. 5 min 2

Introduction

Performance Metrics of DSP Systems ▪ Hardware circuitry and resources (area) ▪ Speed of execution ▪ Power consumption ▪ Finite word length performance Media IC & System Lab 4

Resource v. s. sample rate Media IC & System Lab 5

Block Diagram Media IC & System Lab 6

Data-Flow Graph (DFG) Media IC & System Lab 7

Dependence Graph(DG) Media IC & System Lab 8

Pipeline

Pipeline(1) ▪ Pipelining is the most important design techniques in VLSI DSP systems. ▪ Used for decreasing critical path and increasing clock rate, or decreasing supply voltage for low power. Media IC & System Lab 10

Pipeline(2) 1. Identify critical path. 2. Place registers along a feed forward cutset. 3. Identify new critical path and keep pipelining until specs are met. Media IC & System Lab 11

Pipeline(3) Disadvantages: 1. Latency increases. 2. Area cost can be huge for 2 D or 3 D data. Media IC & System Lab 12

Pipeline(4) 2 8 D Balanced Pipelining: Cut critical path in half. 5 5 D Fine grain Pipelining: Media IC & System Lab 13

Pipeline(5) Media IC & System Lab 14

Parallel

Parallel(1) ▪ Media IC & System Lab 16

Parallel(2) Media IC & System Lab 17

Parallel(3) Whole system: Media IC & System Lab 18

Parallel(4) Media IC & System Lab 19

Parallel(5) Combine with pipelining: Media IC & System Lab 20

Lowering Supply Voltage Pipelining: Media IC & System Lab Parallel: 21

Conclusion Media IC & System Lab 22

Retiming

Retiming(1) Adjusting existing registers. ▪ Reducing the clock period ▪ Reducing the number of registers ▪ Reducing the power consumption ▪ Can deal with recursive system Media IC & System Lab 24

Retiming(2) Cutset retiming: Media IC & System Lab 25

Retiming(3) Single Cutset: D D D Media IC & System Lab 26

Retiming(4) Target: 5. 5 ns a * b D D out D c Media IC & System Lab 27

Retiming(5) a * b D D out D c 5. 89 ns 7. 44 ns 0. 99 ns report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 28

Retiming(6) a b c D 5. 48 ns D 5. 50 ns out 5. 50 ns optimize_register: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 29

Retiming(7) a * b D D out D c 5. 92 ns 7. 47 ns 0. 99 ns Modifying target period to 4. 5 ns: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 30

Retiming(8) a D b D c D * D D out D Modifying RTL: Media IC & System Lab 31

Retiming(8) a D b D c 0. 48 ns D * D D out D 5. 97 ns 7. 36 ns 0. 99 ns Before retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 32

Retiming(9) a b c D D 4. 39 ns D 4. 50 ns out 4. 50 ns After retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 33

Unfolding / Folding

Unfolding(1) Media IC & System Lab 35

Unfolding(2) Media IC & System Lab 36

Folding(1) ▪ Trade area for timing. ▪ Scheduling is required. Media IC & System Lab 37

Folding(2) Media IC & System Lab 38

Systolic Array

Systolic Array(1) ▪ A network of processing elements (PEs) that rhythmically compute and pass data through the system ▪ Modularity and regularity ▪ All the PEs in the systolic array are uniform and fully pipelined ▪ Contains only local interconnection Media IC & System Lab 40

Systolic Array(2) 1. Design DG 2. Mapping to DFG 3. VLSI Array design Media IC & System Lab 41

Systolic Array(3) 1. Design DG Media IC & System Lab 42

Systolic Array(4) Media IC & System Lab 43

Systolic Array(5) 3. Hardware architecture Media IC & System Lab 44

Systolic Array(6) Media IC & System Lab 45

Scheduling & Resource Allocation

Scheduling & Resource Allocation(1) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 47

Scheduling & Resource Allocation(2) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 48

Scheduling & Resource Allocation(3) Single Interval Formulation: Need 5 multipliers and 2 adders simultaneously 34 time units of storage are required Media IC & System Lab 49

Scheduling & Resource Allocation(4) Rescheduling: Processor bound: * + Need 3 multipliers and 1 adders simultaneously 38 time units of storage are required Media IC & System Lab 50

Scheduling & Resource Allocation(5) Considering two cycles: Need 2 multipliers and 1 adders simultaneously 46 time units of storage are required Media IC & System Lab 51

Scheduling & Resource Allocation(6) ASAP and ALAP scheduling Media IC & System Lab 52

Scheduling & Resource Allocation(7) Earliest deadline and Slack time scheduling Media IC & System Lab 53

Scheduling & Resource Allocation(8) Clique partition Media IC & System Lab 54

Scheduling & Resource Allocation(9) Left-edge algorithm Sorting left edge Media IC & System Lab 55

Scheduling & Resource Allocation(10) Left-edge algorithm Media IC & System Lab 56

Conclusion

Media IC & System Lab 58