DSP Architecture Design Crash Course 2020 Wen Tsung


























































- Slides: 58

DSP Architecture Design Crash Course 2020 Wen Tsung Hsieh 7/23

Overview ▪ Introduction ▪ Pipeline ▪ Parallel ▪ Retiming ▪ Unfolding/Folding ▪ Systolic array ▪ Scheduling Media IC & System Lab 5 min 7. 5 min 2

Introduction

Performance Metrics of DSP Systems ▪ Hardware circuitry and resources (area) ▪ Speed of execution ▪ Power consumption ▪ Finite word length performance Media IC & System Lab 4

Resource v. s. sample rate Media IC & System Lab 5

Block Diagram Media IC & System Lab 6

Data-Flow Graph (DFG) Media IC & System Lab 7

Dependence Graph(DG) Media IC & System Lab 8

Pipeline

Pipeline(1) ▪ Pipelining is the most important design techniques in VLSI DSP systems. ▪ Used for decreasing critical path and increasing clock rate, or decreasing supply voltage for low power. Media IC & System Lab 10

Pipeline(2) 1. Identify critical path. 2. Place registers along a feed forward cutset. 3. Identify new critical path and keep pipelining until specs are met. Media IC & System Lab 11

Pipeline(3) Disadvantages: 1. Latency increases. 2. Area cost can be huge for 2 D or 3 D data. Media IC & System Lab 12

Pipeline(4) 2 8 D Balanced Pipelining: Cut critical path in half. 5 5 D Fine grain Pipelining: Media IC & System Lab 13

Pipeline(5) Media IC & System Lab 14

Parallel

Parallel(1) ▪ Media IC & System Lab 16

Parallel(2) Media IC & System Lab 17

Parallel(3) Whole system: Media IC & System Lab 18

Parallel(4) Media IC & System Lab 19

Parallel(5) Combine with pipelining: Media IC & System Lab 20

Lowering Supply Voltage Pipelining: Media IC & System Lab Parallel: 21

Conclusion Media IC & System Lab 22

Retiming

Retiming(1) Adjusting existing registers. ▪ Reducing the clock period ▪ Reducing the number of registers ▪ Reducing the power consumption ▪ Can deal with recursive system Media IC & System Lab 24

Retiming(2) Cutset retiming: Media IC & System Lab 25

Retiming(3) Single Cutset: D D D Media IC & System Lab 26

Retiming(4) Target: 5. 5 ns a * b D D out D c Media IC & System Lab 27

Retiming(5) a * b D D out D c 5. 89 ns 7. 44 ns 0. 99 ns report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 28

Retiming(6) a b c D 5. 48 ns D 5. 50 ns out 5. 50 ns optimize_register: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 29

Retiming(7) a * b D D out D c 5. 92 ns 7. 47 ns 0. 99 ns Modifying target period to 4. 5 ns: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 30

Retiming(8) a D b D c D * D D out D Modifying RTL: Media IC & System Lab 31

Retiming(8) a D b D c 0. 48 ns D * D D out D 5. 97 ns 7. 36 ns 0. 99 ns Before retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 32

Retiming(9) a b c D D 4. 39 ns D 4. 50 ns out 4. 50 ns After retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 33

Unfolding / Folding

Unfolding(1) Media IC & System Lab 35

Unfolding(2) Media IC & System Lab 36

Folding(1) ▪ Trade area for timing. ▪ Scheduling is required. Media IC & System Lab 37

Folding(2) Media IC & System Lab 38

Systolic Array

Systolic Array(1) ▪ A network of processing elements (PEs) that rhythmically compute and pass data through the system ▪ Modularity and regularity ▪ All the PEs in the systolic array are uniform and fully pipelined ▪ Contains only local interconnection Media IC & System Lab 40

Systolic Array(2) 1. Design DG 2. Mapping to DFG 3. VLSI Array design Media IC & System Lab 41

Systolic Array(3) 1. Design DG Media IC & System Lab 42

Systolic Array(4) Media IC & System Lab 43

Systolic Array(5) 3. Hardware architecture Media IC & System Lab 44

Systolic Array(6) Media IC & System Lab 45

Scheduling & Resource Allocation

Scheduling & Resource Allocation(1) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 47

Scheduling & Resource Allocation(2) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 48

Scheduling & Resource Allocation(3) Single Interval Formulation: Need 5 multipliers and 2 adders simultaneously 34 time units of storage are required Media IC & System Lab 49

Scheduling & Resource Allocation(4) Rescheduling: Processor bound: * + Need 3 multipliers and 1 adders simultaneously 38 time units of storage are required Media IC & System Lab 50

Scheduling & Resource Allocation(5) Considering two cycles: Need 2 multipliers and 1 adders simultaneously 46 time units of storage are required Media IC & System Lab 51

Scheduling & Resource Allocation(6) ASAP and ALAP scheduling Media IC & System Lab 52

Scheduling & Resource Allocation(7) Earliest deadline and Slack time scheduling Media IC & System Lab 53

Scheduling & Resource Allocation(8) Clique partition Media IC & System Lab 54

Scheduling & Resource Allocation(9) Left-edge algorithm Sorting left edge Media IC & System Lab 55

Scheduling & Resource Allocation(10) Left-edge algorithm Media IC & System Lab 56

Conclusion

Media IC & System Lab 58