CS 7960 4 Lecture 25 Wire Delay is

![Prior Results • Hrishikesh et al. [ISCA’ 02]: Optimal pipeline depth is 6 -8 Prior Results • Hrishikesh et al. [ISCA’ 02]: Optimal pipeline depth is 6 -8](https://slidetodoc.com/presentation_image_h2/042c0192ceea9bbf3e058c11f53d11b4/image-2.jpg)












- Slides: 14
CS 7960 -4 Lecture 25 Wire Delay is not a Problem for SMT Z. Chishti, T. N. Vijaykumar Proceedings of ISCA-31 June, 2004
Prior Results • Hrishikesh et al. [ISCA’ 02]: Optimal pipeline depth is 6 -8 FO 4 at 100 nm technology • Agarwal et al. [ISCA’ 00]: IPCs will decrease dramatically due to wire delays Goals: • How does pipeline depth vary with technology? • How does SMT influence thruput and pipeline depth? • Identify and alleviate bottlenecks (bandwidth)
Critical Loops
Back-to-Back Instructions • The loop lengths determine the delay between back-to-back dependent instructions • Some loops can be optimized with aggressive designs (rename, ALUs) • Difficult loops: cache access, branch prediction
Superscalar vs. SMT • For superscalars, deep pipelining more overheads in each loop more delay between b 2 b instrs performance loss • For SMT, slowing a dependence chain is not a problem – can find other useful work • Deep pipelines can benefit SMT since it affords more parallelism – how do you build deep pipelines?
Wire Delays and Bandwidth • Wire delays can limit bandwidth in RAM/CAMs – they control the delay between successive accesses • Bitline signals are weak – a latch can be introduced only after the sense-amp
Bitline-Scaling Decode Mux+output driver Latency-optimized Low bandwidth Low latency Bitline-scaled High bandwidth High latency
Delay Results
Examining Deep Pipelines • Bitline-scaling allows high bandwidth enables deep pipelining (high parallelism, longer chains) • Range of implementations: Ø b 2 b: aggressive design that allows instrs to issue back-2 -back in spite of long loops Ø nb 2 b: low-complexity design that can severely limit single-thread ILP
Effect of Wire Delays on IPC • Assumes that all structures are perfectly pipelined
Effect of Technology on Pipeline Depth • For a single thread, as we move from 100 nm 50 nm, optimal depth goes from 8 10 and 6 8 FO 4, for nb 2 b and b 2 b • Multiprogrammed workload remains at 8 (nb 2 b) and 6 FO 4 (b 2 b) • Multiprogramming lets you keep up with Moore’s Law
Effect of Bandwidth Constraints • Perfect has the latency of latency-optimized and the bandwidth of bitline-scaled • l-o does well for single-thread, but very poorly for five threads
Conclusions • For superscalars, the optimal logic depth shall grow because of longer wire delays and lack of parallelism • SMT is unaffected – has parallelism to offset back-2 -back inefficiencies • SMT meets Moore’s Law expectations by increasing the number of threads • SMT has high bandwidth needs – soln: bitline-scaling
Title • Bullet