DSP Architecture Design Crash Course 2020 Wen Tsung


























































- Slides: 58
DSP Architecture Design Crash Course 2020 Wen Tsung Hsieh 7/23
Overview ▪ Introduction ▪ Pipeline ▪ Parallel ▪ Retiming ▪ Unfolding/Folding ▪ Systolic array ▪ Scheduling Media IC & System Lab 5 min 7. 5 min 2
Introduction
Performance Metrics of DSP Systems ▪ Hardware circuitry and resources (area) ▪ Speed of execution ▪ Power consumption ▪ Finite word length performance Media IC & System Lab 4
Resource v. s. sample rate Media IC & System Lab 5
Block Diagram Media IC & System Lab 6
Data-Flow Graph (DFG) Media IC & System Lab 7
Dependence Graph(DG) Media IC & System Lab 8
Pipeline
Pipeline(1) ▪ Pipelining is the most important design techniques in VLSI DSP systems. ▪ Used for decreasing critical path and increasing clock rate, or decreasing supply voltage for low power. Media IC & System Lab 10
Pipeline(2) 1. Identify critical path. 2. Place registers along a feed forward cutset. 3. Identify new critical path and keep pipelining until specs are met. Media IC & System Lab 11
Pipeline(3) Disadvantages: 1. Latency increases. 2. Area cost can be huge for 2 D or 3 D data. Media IC & System Lab 12
Pipeline(4) 2 8 D Balanced Pipelining: Cut critical path in half. 5 5 D Fine grain Pipelining: Media IC & System Lab 13
Pipeline(5) Media IC & System Lab 14
Parallel
Parallel(1) ▪ Media IC & System Lab 16
Parallel(2) Media IC & System Lab 17
Parallel(3) Whole system: Media IC & System Lab 18
Parallel(4) Media IC & System Lab 19
Parallel(5) Combine with pipelining: Media IC & System Lab 20
Lowering Supply Voltage Pipelining: Media IC & System Lab Parallel: 21
Conclusion Media IC & System Lab 22
Retiming
Retiming(1) Adjusting existing registers. ▪ Reducing the clock period ▪ Reducing the number of registers ▪ Reducing the power consumption ▪ Can deal with recursive system Media IC & System Lab 24
Retiming(2) Cutset retiming: Media IC & System Lab 25
Retiming(3) Single Cutset: D D D Media IC & System Lab 26
Retiming(4) Target: 5. 5 ns a * b D D out D c Media IC & System Lab 27
Retiming(5) a * b D D out D c 5. 89 ns 7. 44 ns 0. 99 ns report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 28
Retiming(6) a b c D 5. 48 ns D 5. 50 ns out 5. 50 ns optimize_register: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 29
Retiming(7) a * b D D out D c 5. 92 ns 7. 47 ns 0. 99 ns Modifying target period to 4. 5 ns: report_timing –from [all_inputs] report_timing –through div_13_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 30
Retiming(8) a D b D c D * D D out D Modifying RTL: Media IC & System Lab 31
Retiming(8) a D b D c 0. 48 ns D * D D out D 5. 97 ns 7. 36 ns 0. 99 ns Before retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 32
Retiming(9) a b c D D 4. 39 ns D 4. 50 ns out 4. 50 ns After retiming: report_timing –from [all_inputs] report_timing –through mult_18_S 2 report_timing –through div_20_S 2/* report_timing –to [all_outputs] Media IC & System Lab report_area 33
Unfolding / Folding
Unfolding(1) Media IC & System Lab 35
Unfolding(2) Media IC & System Lab 36
Folding(1) ▪ Trade area for timing. ▪ Scheduling is required. Media IC & System Lab 37
Folding(2) Media IC & System Lab 38
Systolic Array
Systolic Array(1) ▪ A network of processing elements (PEs) that rhythmically compute and pass data through the system ▪ Modularity and regularity ▪ All the PEs in the systolic array are uniform and fully pipelined ▪ Contains only local interconnection Media IC & System Lab 40
Systolic Array(2) 1. Design DG 2. Mapping to DFG 3. VLSI Array design Media IC & System Lab 41
Systolic Array(3) 1. Design DG Media IC & System Lab 42
Systolic Array(4) Media IC & System Lab 43
Systolic Array(5) 3. Hardware architecture Media IC & System Lab 44
Systolic Array(6) Media IC & System Lab 45
Scheduling & Resource Allocation
Scheduling & Resource Allocation(1) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 47
Scheduling & Resource Allocation(2) Scheduling: when to do the process(node)? Resource Allocation: who to execute the process? Scheduling optimal: Sample period, Delay, Resource, Processor, Memory Media IC & System Lab 48
Scheduling & Resource Allocation(3) Single Interval Formulation: Need 5 multipliers and 2 adders simultaneously 34 time units of storage are required Media IC & System Lab 49
Scheduling & Resource Allocation(4) Rescheduling: Processor bound: * + Need 3 multipliers and 1 adders simultaneously 38 time units of storage are required Media IC & System Lab 50
Scheduling & Resource Allocation(5) Considering two cycles: Need 2 multipliers and 1 adders simultaneously 46 time units of storage are required Media IC & System Lab 51
Scheduling & Resource Allocation(6) ASAP and ALAP scheduling Media IC & System Lab 52
Scheduling & Resource Allocation(7) Earliest deadline and Slack time scheduling Media IC & System Lab 53
Scheduling & Resource Allocation(8) Clique partition Media IC & System Lab 54
Scheduling & Resource Allocation(9) Left-edge algorithm Sorting left edge Media IC & System Lab 55
Scheduling & Resource Allocation(10) Left-edge algorithm Media IC & System Lab 56
Conclusion
Media IC & System Lab 58