Proximity Optimization for Adaptive Circuit Design Ang Lu

  • Slides: 31
Download presentation
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu

Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu

Overview • • Introduction Proposed Techniques Experiment Result Conclusion 2

Overview • • Introduction Proposed Techniques Experiment Result Conclusion 2

Design Challenges Process Variations Device Aging Power 3

Design Challenges Process Variations Device Aging Power 3

Adaptive Circuit • Apply power according to actual chip variations • More energy-efficient than

Adaptive Circuit • Apply power according to actual chip variations • More energy-efficient than the worst case desgin 4

Sensors and Tuning Knobs in Adaptive Circuit • Delay variation sensors – Critical path

Sensors and Tuning Knobs in Adaptive Circuit • Delay variation sensors – Critical path replica, used in IBM Power 5 processor – Canary flip-flop, like in Razor • Tuning knobs – Adaptive body bias – Adaptive supply voltage (voltage interpolation) Liang, et al. , Micro 2009 5

Pros and Cons of Adaptive Circuit Pros: üMore energy-efficient than the worst case designs

Pros and Cons of Adaptive Circuit Pros: üMore energy-efficient than the worst case designs Cons: L Potentially large area overhead L Higher design complexity 6

Clustering for Adaptive Circuits Two extreme cases: • Tune each cell individually? Ø Too

Clustering for Adaptive Circuits Two extreme cases: • Tune each cell individually? Ø Too much overhead • Tune entire circuit collectively? Ø Less energy savings Achieve desired trade-off? • Clustering 7

Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware

Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement • Experiment Result • Conclusion 8

Proposed Flow 9

Proposed Flow 9

Existing Clustering Methods • Manual partition for regular datapath Timing Location • Cluster cells

Existing Clustering Methods • Manual partition for regular datapath Timing Location • Cluster cells based on spatial proximity • Cluster cells based on their timing characteristics – Kulcarni, et al. , TCAD 2008 – Monte Carlo (MC) simulation – Optimize for each MC run Timing Location 10

Need to Consider Both Timing & Spatial Proximity Both paths A & B are

Need to Consider Both Timing & Spatial Proximity Both paths A & B are critical Bad Cluster: A 1 & B 2 (Similar timing characteristic) Good Cluster: A 2(A 1) & B 2 (B 1) 11

Clustering Algorithm Clustering algorithm Distance definition Timing Location 12

Clustering Algorithm Clustering algorithm Distance definition Timing Location 12

Timing Slack 13

Timing Slack 13

Timing Sensitivity • 14

Timing Sensitivity • 14

Correlations? • Highly correlated cells are clustered together • Spatial correlation is partially addressed

Correlations? • Highly correlated cells are clustered together • Spatial correlation is partially addressed by spatial proximity • Structural correlation – Not every cell on one critical path need to be tuned – Structurally correlated cells are rarely too far apart 15

Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware

Overview • Introduction • Proposed Techniques – Overall Flow – Time and Location Aware Cell Clustering – Clustering Driven Incremental Placement • Experiment Result • Conclusion 16

Cluster Driven Incremental Placement 17

Cluster Driven Incremental Placement 17

Min-Cost Network Flow Formulation Source nodes Sink nodes 18

Min-Cost Network Flow Formulation Source nodes Sink nodes 18

Implementation Them min-cost network flow problem is solved by the Edmond-Karp algorithm Move cells

Implementation Them min-cost network flow problem is solved by the Edmond-Karp algorithm Move cells heuristically for fractional flow solutions 19

Wirelength Overhead Control • After incremental placement, wirelength increase is estimated • If the

Wirelength Overhead Control • After incremental placement, wirelength increase is estimated • If the increase > threshold, rerun clustering with increased weight on spatial proximity 20

Overview • • Introduction Proposed Techniques Experiment Result Conclusion 21

Overview • • Introduction Proposed Techniques Experiment Result Conclusion 21

Experiment Setup • Benchmark: – ICCAD 2014 Incremental Timing-Driven Placement Contest benchmark suites –

Experiment Setup • Benchmark: – ICCAD 2014 Incremental Timing-Driven Placement Contest benchmark suites – 7 circuits, (130 K , 960 K) cells • Adaptive body bias is employed as platform of adaptive circuit design • # cluster is empirically chosen in a range from 10 to 25 22

Comparison and Methodology Methods that are compared: 1. 2. 3. 4. Over design Location-driven

Comparison and Methodology Methods that are compared: 1. 2. 3. 4. Over design Location-driven clustering Timing-driven clustering Location and timing driven clustering (Ours) Methodology For each method, simulate multiple times with varying parameters and report the average results 23

Testcases and Placement Perturbations Circuit # gates Cells moved Avg. cell move distance edit_dist

Testcases and Placement Perturbations Circuit # gates Cells moved Avg. cell move distance edit_dist 130674 5. 1% 7 matrix_mult 155341 5. 7% 8 vga_lcd 164891 8. 1% 27 b 19 219268 8. 3% 12 leon 3 mp 649191 3. 9% 89 leon 2 794286 8. 3% 50 netcard 958792 5. 5% 90 24

Results from only Forward Body Bias 1, 6 1, 4 1, 2 1, 0

Results from only Forward Body Bias 1, 6 1, 4 1, 2 1, 0 0, 8 0, 6 0, 4 0, 2 0, 0 Over design Power location-aware clustering ∆Area Timing-aware clustering Ours ∆Wire-length Our method achieves • 99% timing yield like other methods • 1/4 less power than over design • 1/3 less area overhead • substantial less wire overhead 25

Results from Forward and Reversed Body Bias 1, 2 1, 0 0, 8 0,

Results from Forward and Reversed Body Bias 1, 2 1, 0 0, 8 0, 6 0, 4 0, 2 0, 0 Over design location-aware clustering Power ∆Area Timing-aware clustering Ours ∆Wire-length Our method achieves • 99% timing yield • similar power • much less area overhead than location-only • much less wire overhead than timing-only 26

Power/Area – Timing Tradeoff Circuit “mgc_matrix_mult” Power/area – Timing tradeoff Δ Area Adapt Power

Power/Area – Timing Tradeoff Circuit “mgc_matrix_mult” Power/area – Timing tradeoff Δ Area Adapt Power 12000 14000 10000 12000 10000 8000 6000 4000 2000 0 3830 2000 3880 3930 3980 4030 4080 0 4130 Timing constraint Adapt Power Δ Area 27

Impact of Weighting Factors Circuit “mgc_matrix_mult” • α β γ # clusters Adapt Power

Impact of Weighting Factors Circuit “mgc_matrix_mult” • α β γ # clusters Adapt Power ∆ Area ∆ wire 0 1 1 23 3707 4111 0. 00% 5 1 1 24 3500 1851 0. 61% 6. 5 7. 2 15 20 1 5 5 5 1 1 1 0 2 4 1 1 0 1 1 1 22 22 12 12 20 24 24 24 7764 9582 10376 6686 3771 3500 3377 4330 8538 10585 10903 8423 2533 1851 1653 3673 0. 76% 0. 82% 6. 70% 8. 50% 9. 50% 0. 61% 0. 60% 0. 55% 28

Entire Flow Runtime 6000 5000 4000 3000 2000 1000 0 Over Design Location Cluster

Entire Flow Runtime 6000 5000 4000 3000 2000 1000 0 Over Design Location Cluster Ours Runtime (s) 29

Conclusion Clustering and cluster-driven placement are proposed for adaptive circuit designs ü Reduce area

Conclusion Clustering and cluster-driven placement are proposed for adaptive circuit designs ü Reduce area and power overhead of adaptive circuit, outperform previous methods ü Assure gates of the same cluster locate in a contiguous region ü Wire-length increase <1%. 30

Built-In Self Optimization for Variation Resilience of Analog Filters Thank you!

Built-In Self Optimization for Variation Resilience of Analog Filters Thank you!