Stage Net A Reconfigurable CMP Fabric for Resilient
Stage. Net: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang Feng Jason Blome Scott Mahlke 2 nd Workshop on Reconfigurable and Adaptable Architecture Dec 1, 2007 1 University of Michigan Electrical Engineering and Computer Science
Reliability Challenge • Increasing defect rates is a major challenge [ITRS’ 03] • ↑ power density ↓ feature sizes ↑ failures in time (FIT) [Srinivasan, DSN‘ 04] • Permanent faults ► ► ► Manufacturing defects Time dependent dioxide breakdown (TDDB) Negative bias threshold inversion (NBTI) Electromigration (EM) …. For 32 nm technology node, an 8 core CMP would face ~30 faults in 4 years 2 University of Michigan Electrical Engineering and Computer Science
Tolerating Permanent Faults • • Current approaches Traditional solutions ► ► ► TMR Tandem / HP Non-stop Impractical for mainstream • • • ► Detection/Prediction • • Cost Power Low gain ► Using sensors Analytical models Redundant execution BIST Repair • Replacement • Reconfiguration Teramac (1995) K-pos DP-31/32 3 University of Michigan Electrical Engineering and Computer Science
Reconfiguration Granularity • Range of choices for the reconfiguration granularity CORE level STAGE level MODULE level FETCH DEC EXEC MEM WB - Elast. IC, DT’ 06 - Reunion, MICRO’ 06 - Configurable Isolation, ISCA’ 07 - Online Diagnosis of Hard Faults, MICRO’ 05 - Ultra Low-Cost Defect Protection, ASPLOS’ 06 Better resource utilization Lower design complexity Lower overheads 4 University of Michigan Electrical Engineering and Computer Science
Mean Time to Failure Comparison MODULE level CORE level + Easiest to do in practice -- Poorest MTTF gains STAGE level MTTF increase (%) CORE level STAGE level + Circuit/logical boundary + Improved MTTF gains -- Architectural complexity MODULE level + Best MTTF gains -- Hardest to repair Area increase (%) 5 University of Michigan Electrical Engineering and Computer Science
Throughput Comparison • Monte-Carlo study • Randomly injected failures • Assumes that stages are shared resources STAGE level CORE level STAGE level reconfiguration allow significantly more graceful throughput degradation 6 University of Michigan Electrical Engineering and Computer Science
Goal of this Research • Design a computing substrate ► ► Fault tolerant Graceful performance degradation with defects Highly reconfigurable Adaptable to the workload Design that can meet the challenge of facing ~ 100 s of faults while maintaining 70 -80% throughput 7 University of Michigan Electrical Engineering and Computer Science
CMP Fabric Stage 1 Stage 2 Stage 1 Stage 3 Stage. N Core 1 Core 0 Stage 1 Stage 2 Stage 1 Stage 3 Stage 2 Stage 3 Stage. N Core 2 Core 3 8 University of Michigan Electrical Engineering and Computer Science
Stage. Net CMP Fabric Allocator Logical pipeline Stage 1 Stage 2 Stage 3 Stage. N Configuration Manager 9 University of Michigan Electrical Engineering and Computer Science
Stage. Net CMP Fabric - Benefits Stage 1 Stage 2 Stage 3 Stage. N Configuration Manager 10 University of Michigan Electrical Engineering and Computer Science
Stage. Net CMP Fabric - Issues • Performance / Efficiency ► ► Scaling with number of stages Impact of router delay • Transmission delay (tdelay) • Congestion delay 64 • Design overheads ► ► Allocator 256 bits Area Power • Micro-architectural concerns ► ► Data forwarding logic Control flow handling 11 University of Michigan Electrical Engineering and Computer Science
Experimental Setup Simulates an in-order core with default parameters Simple. Scalar 4. 0 Stores statistics for the benchmarks - No. of instructions - No. of cycles - Branch mis-predicts - I/D cache misses …. Mi. Bench suite Parameterizable performance model for Stage. Net Model CPI Results 12 University of Michigan Electrical Engineering and Computer Science
Effect of varying pipeline depth tdelay 1 13 University of Michigan Electrical Engineering and Computer Science
Effect of varying transmission delay stages 10 14 University of Michigan Electrical Engineering and Computer Science
Performance enhancement • Router delay is the leading cause for the slowdown • Need some way to improve system utilization Max length 4 Max live-ins 2 • Let us send macro-ops (MOP) ► MOP is an instruction bundle • Upper bound on length • Upper bound on live-ins / live-outs • No branches in between ► >> + 15 LD + / & ST >> << Advantages • Amortizes delay / contention • Increases resource utilization LD ST University of Michigan Electrical Engineering and Computer Science
Effect of varying MOP size tdelay 4 stages 10 16 University of Michigan Electrical Engineering and Computer Science
Conclusions • Reliability aware architectures with a finer grained reconfiguration are desirable for: ► ► Better MTTF gains Graceful throughput degradation • Stage. Net, a potential solution, allows stage level reconfiguration and is: ► ► ► Easy to reconfigure Inherently redundant Potentially scalable issue width • Using Stage. Net, significant reconfiguration flexibility can be traded with a small loss in performance 17 University of Michigan Electrical Engineering and Computer Science
Future Work • Micro-architectural issues ► ► ► Data bypass handling Control flow handling Sharing state between pipeline stages • Network design ► ► Design of routers Design of interconnection • Simulation setup ► Validation of results using a cycle accurate simulator 18 University of Michigan Electrical Engineering and Computer Science
Stage. Net: A Reconfigurable CMP Fabric for Resilient Systems 19 University of Michigan Electrical Engineering and Computer Science
Back up slides 20 University of Michigan Electrical Engineering and Computer Science
IF/ID BIST Test Vectors Test DECODER Repair ID/EX CHECKER (majority) Ultra Low-Cost Defect Protection for Microprocessor Pipelines, ASPLOS’ 06 Elast. IC DT’ 06 F. Bower, Tolerating Hard Faults in Microprocessor Array Structures, DSN’ 04 H. Qin, UC Berkeley 21 University of Michigan Electrical Engineering and Computer Science
- Slides: 21