Minimal Skew Clock Synthesis Considering TimeVariant Temperature Gradient

  • Slides: 22
Download presentation
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu Hu Partially supported by SRC task 1116.

Introduction l Both process and operation variations cause uncertainties and may lead to design

Introduction l Both process and operation variations cause uncertainties and may lead to design failure or over-design. l Process variations have been actively studied. ¤ ¤ ¤ Statistical timing analysis Stochastic optimization Post-silicon configuration l Stochastic optimization for operation variations below has been largely ignored ¤ ¤ Fluctuation of crosstalk noise and P/G network noise due to different input vectors Time-variant on-chip temperature map over different workloads l This work is the first in-depth study on clock synthesis considering time-variant temperature variations

Limitation of Existing Work l The existing work [Cho: ICCAD 05] ignores the time-variant

Limitation of Existing Work l The existing work [Cho: ICCAD 05] ignores the time-variant temperature variations and assumes a fixed temperature map l Different work loads lead to different temperature maps (e. g. , two SPEC 2000 applications: Ammp and Gzip) l Optimizing skew for one application hurts the skew for another application, this conflict is solved in this work

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Stochastic Temperature Model l The temperature map is unique for each application or program

Stochastic Temperature Model l The temperature map is unique for each application or program phase ¤ can be obtained by u. Arch-level simulation l For each region of the chip, temperature is characterized by its mean and variance over a number of maps ¤ Primary component analysis (PCA) to decide # of maps l Temperature correlation measured as covariance between regions is high over SPEC 2000 benchmark set (i, j) Correlation between region i and j Considering temperature correlations during optimization can compress searching space!

Problem Formulation l Given: ¤ ¤ The source, sinks and an initial tree embedding

Problem Formulation l Given: ¤ ¤ The source, sinks and an initial tree embedding A set of temperature maps for a benchmark set l Design freedoms: ¤ ¤ Re-embedding of clock tree Cross link insertion l To minimize the worst case skew among given temperature maps

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Bottom-up Greedy-based Re-embedding v x y Re-embedding option a b c d Sink Original

Bottom-up Greedy-based Re-embedding v x y Re-embedding option a b c d Sink Original merging point

Bottom-up Greedy-based Re-embedding v x y a b c d New merging point

Bottom-up Greedy-based Re-embedding v x y a b c d New merging point

Delay and Skew with Re-embedding l Perturbed Modified Nodal Analysis (MNA) ¤ ¤ ¤

Delay and Skew with Re-embedding l Perturbed Modified Nodal Analysis (MNA) ¤ ¤ ¤ x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and sensitivity [key to triangulate the system] The(Δx) number of re-embedding options I=5 N is huge! (N is number of merging points) l Structured and parameterized state matrix

Compressing Solution Space by Temperature Correlation l Motivation ¤ Highly correlated merging points should

Compressing Solution Space by Temperature Correlation l Motivation ¤ Highly correlated merging points should be re-embedded in the same fashion l Solution ¤ ¤ ¤ Calculate correlation between two merging points based on temperature correlations Cluster merging points based on correlation strength Perform the same re-embedding for all points within one cluster

Temperature Correlation Driven Clustering l Correlation matrix C of merging points is low-ranked, and

Temperature Correlation Driven Clustering l Correlation matrix C of merging points is low-ranked, and Singular Value Decomposition (SVD) reveals the rank K Low-Rank Approx. l. K = 4, N = 70 l. Reduced from 570 to 54 l Partition the merging points into K clusters (K-Means) ¤ Maximize the correlation strength within each of K clusters

Recap of Skew Calculation with Re-embedding Cluster based reduction (SVD + KMeans) K <<

Recap of Skew Calculation with Re-embedding Cluster based reduction (SVD + KMeans) K << N e is -w 6] k 0 R ’ oc MO DAC l , B al Delay and Skew Transient time analysis (Back-Euler) u [Y et es (B a tp pe d ar w a r ) ee in m no

Simultaneous Re-embedding and Cross Link Insertion 1. Decide crosslink candidates according to [Rajaram, DAC

Simultaneous Re-embedding and Cross Link Insertion 1. Decide crosslink candidates according to [Rajaram, DAC 04] 2. Cluster crosslink candidates again based on the temperature correlation 3. Calculate skew sensitivities w. r. t. crosslink and reembedding candidates § In a fashion similar to the previous triangular block-wise MOR 4. Bottom-up select the best crosslink or re-embedding

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Outline l Modeling and Problem Formulation l Algorithms l Experimental Results l Conclusions

Experimental Settings l Temperature maps are obtained by micro-architecture level power-temperature transient simulator [Liao,

Experimental Settings l Temperature maps are obtained by micro-architecture level power-temperature transient simulator [Liao, TCAD’ 05] with 6 SPEC 2000 applications l 100 temperature maps, one for each 10 million clock cycles l Compare four algorithms (two categories) ¤ Traditional optimization under nominal temperature and Elmore delay ¢ ¢ ¤ DME: deferred merging-point embedding to minimize wire-length for zero-skew xlink: cross-link insertion [Rajaram, ICCAD'04] The proposed algorithms with temperature variation and highorder delay model ¢ ¢ re-embed: re-embedding xlink+ Re-embed: simultaneously re-embedding and cross-link insertion

Skew Distribution Over 100 Temperature Maps l X+R = cross link insertion + re-embedding

Skew Distribution Over 100 Temperature Maps l X+R = cross link insertion + re-embedding l DME = Deferred Merging points Embedding

Worst-case Skew l For tree structure, re-embed reduces the worst-case skew by 3 x

Worst-case Skew l For tree structure, re-embed reduces the worst-case skew by 3 x on average (up to 20 x) compared to DME. l For non-tree structure, xlink+re-embed reduces the worst-case skew by 30% on average (up to 7 x) compared to xlink. ps

Wire Length l For tree structure, re-embed has less than 1% wire length overhead

Wire Length l For tree structure, re-embed has less than 1% wire length overhead compared to DME l For non-tree structure, xlink+re-embed has 5% LESS wire length compared to xlink.

Runtime l Temperature-aware optimizations (re-embed and xlink+reembed) are about 10 x slower compared to

Runtime l Temperature-aware optimizations (re-embed and xlink+reembed) are about 10 x slower compared to DME and xlink, respectively, but ¤ ¤ Our work uses high-order delay model DME and xlink use Elmore delay

Conclusions l Studied the clock optimization for workload dependent temperature variation ¤ Reduced the

Conclusions l Studied the clock optimization for workload dependent temperature variation ¤ Reduced the worst-case skew by up to 7 X with LESS wire-length compared to best existing method l Correlation-aware modeling and optimization paradigm can be extended to handle PVT variations, and more design freedoms ¤ ¤ “Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load” [Chu et al, ICCAD 07] “Efficient Decoupling Capacitance Budgeting Considering Operation and Processing Variations” [Shi et al, finalist for Best Paper, ICCAD 07]

Thank you! SRC Tech. Con 2007 Hao Yu (graduated), Yu Hu (presenter), Chun-Chen Liu

Thank you! SRC Tech. Con 2007 Hao Yu (graduated), Yu Hu (presenter), Chun-Chen Liu and Lei He (PI) Minimal Skew Clock Embedding Considering Time Variant Temperature Gradient