Viper Virtual Pipelines for Enhanced Reliability Andrea Pellegrini
Viper: Virtual Pipelines for Enhanced Reliability Andrea Pellegrini, Joseph Greathouse, Valeria Bertacco University of Michigan, Advanced Computer Architecture Laboratory ISCA 2012
Reliability Challenges with CMOS Scaling Manufacturing defects Transient faults That escape testing Natural radiation, noise… Age-related wearout Electromigration & gate-oxide breakdown Intel Cougar Point chipsets wearing out over time [Estimated cost of 700 M$ 2011] “Future technologies will make transistors less and less reliable” [Borkar, 2005] 2
Impact of Faults on Traditional CMPs Core Core Cach e 3
Max IPC Faults Effects on CMP Throughput 140 120 100 80 60 40 20 0 Ideal reliable architecture CMP 0 n 200 400 600 Faults 800 CMP system w/ 2 billion transistors fitting 128 cores – no caches q 15 M transistors/core, similar to Intel Atom 1000 Core Core … … … 4
Limitations of Current µArchitectures Single point of failures Fetch Decoder Integer ALU Decoder L S Q Back End Floating Point 5
Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Fetch Decoder Integer ALU Decoder L S Q Back End Floating Point 6
Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Centralized control logic Fetch Decoder Integer ALU Decoder L S Q Back End Floating Point 7
Can We Overcome These Single point of failures Limitations? ü Arrays of redundant hardware units Rigidly connected pipeline stages ü Loosely connected hardware modules Centralized control logic ü Decentralized and redundant controls Service-oriented µ-architecture to tackle all three issues: Virtual Pipelines for Enhanced Reliability 8
Service-Oriented µ-Architecture Renew Driving License: q 1. Check in q 2. Vision test q 3. Take picture q 4. Pay fee q 5. Get license 9
Service-Oriented µ-Architecture 10
Service-Oriented µ-Architecture 11
Viper - Overview n HW units can perform services for instructions n Bundles are instruction sequences terminating with a control instruction (JMP) 4013 c 3: add %al, [%ebx] 4013 c 5: div cl 4013 c 8: jmp 40140 a n A Virtual Pipeline is the ordered sequence of HW units that can complete the instructions in a bundle n Bundle Scheduling Unit allows instructions to use and be scheduled on the available HW units BSU BSU n An ISA is defined as the set of services needed by its instructions 12
Viper Hardware Organization - - - BSU BSU Sea of redundant HW modules Single point of $ failure Homogenous module interconnect Rigidly connected hardware Bundle modules. Scheduling Units (BSU) to schedule HW BSU BSU modules Centralized control logic BSU BSU BSU Crossbar BSU BSU R F BSU BSU BSU 13
Viper’s Execution Model 1. 2. 3. 4. Building Virtual Pipelines Inter-Module Data Dependencies Handling Program Mispredictions Handling Precise Exceptions 14
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC 1 4013 c 3 NPC Fetc h Re g Exe c WB F 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 15
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC 1 4013 c 3 NPC Fetc h Re g Exe c WB F 0 R 1 E 0 W 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 16
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 17
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 18
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 F 1 R 0 E 1 W 1 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 19
Viper’s Distributed Control Logic n n HW units can negotiate their services with BSU through: q Queues q Proposal broadcasts q Tokens Resource starvation avoided if the oldest bundle is served first F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 20
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions others 2. Inter-Cluster Dependencies Clusters might need operands generated by Input Tags BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 F 1 R 0 E 1 3 4013 c 3: add %al, [%ebx] R A R B R C R D Output Tags R A R B R C R D Additional storage required W 1 (768 bits/BSU for x 86) BSU BSU 4013 c 8: or %al, %bl BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 21
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions others 2. Inter-Cluster Dependencies Clusters might need operands generated by Input Tags Output Tags BSU ID PC NPC Fetc h Re g Exe c WB R A R B R C R D 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 1 5 6 10 13 1 5 6 10 2 4013 c 8 F 1 R 0 E 1 W 1 3 4013 c 3: add %al, [%ebx] BSU BSU 4013 c 8: or %al, %bl BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 22
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions others 2. Inter-Cluster Dependencies Clusters might need operands generated by Input Tags Output Tags BSU ID PC NPC Fetc h Re g Exe c WB R A R B R C R D 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 1 5 6 10 13 5 6 10 2 4013 c 8 F 1 R 0 E 1 W 1 13 5 6 10 3 4013 c 3: add %al, [%ebx] Tag creation is serialized BSU BSU F 0 F 1 0 1 BSU BSU Possible optimization: Tags based bundle ID R 0 R 1 Tag 13 • Does not require serialization E 0 E 1 4013 c 8: or %al, %bl • Much smaller storage needed W Value W BSU BSU BSU 23
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 c 3 F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 24
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 eb 4013 c 3 F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU Branch mispredicted! Fetch. NPC: instruction Correct 4013 eb Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 25
1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 eb F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 26
4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 3: add %al, [%ebx] 4. Precise Exceptions 4013 c 5: div cl 4013 c 8: jmp 40140 a BSU ID Nex t PC 0 1 4013 c 3 NPC Fetc h Reg Exe c W B F 0 R 1 E 0 W 0 1 2 Exception! Division by 0 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 27
4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 3: add %al, [%ebx] 4. Precise Exceptions 4013 c 5: div cl 4013 c 8: jmp 40140 a BSU ID Nex t 0 1 4013 c 3 4013 c 5 F 0 4013 c 3 4013 c 5 R 1 F 0 W 0 E 0 R 1 E 0 W 0 1 2 4013 c 5 F 1 4013 c 5 Exc. Handler Exc. F 1 Handler R 0 F 1 W 1 E 1 R 0 E 1 W 1 2 PC PC Fetc Exe Reg Exe NPC Reg Fetc W Reg W Exe Fetc NPCNPC Fetch Re Bc h ch Bc gh Exc. Handler 4013 c 8 BSU BSU BSU W B F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 28
Impact of Faults in Viper BSU BSU I$ BSU BSU BSU Crossbar BSU BSU R F Handling runtime failures: 1. Periodic full system checkpoint 2. Detected through hardware self-tests or SW symptoms 3. If a fault is detected: a. Faulty BSU component is diagnosed and disabled BSU b. System state is restored to the previous checkpoint BSU BSU BSU c. Program execution is restarted BSU BSU 29
Experimental Setup n Viper Configuration q q q n Baseline CMP q q n 6 services – Fetch, Decode, Tag Generation, Execute, Commit, Write. Back 4 copy of 5 Clusters 4 cycles latency crossbar / 1 cycle cluster communication latency 4 Oo. O cores: 32 k D$ and I$ / 12 stage pipeline / 128 entry ROB / 5 RS entry per FU 6 In-order cores: 32 k D$ and I$ / 12 stage pipeline Microarch simulation q q Gem 5 / timing accurate / system emulation mode SPEC 2006 and MIBench 30
2 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio rijndael say susan toast untoast 401. bzip 2 410. bwaves 429. mcf 433. milc 436. cactus. ADM 437. leslie 3 d 444. namd 447. deal. II 456. hmmer 459. Gems. FDTD 462. libquantum 464. h 264 ref 470. lbm 471. omnetpp 473. astar 998. specrand 999. specrand Relative IPC Viper Enables Reliable Oo. O Execution 3. 5 3 2. 5 Mean +87% Mi. Bench Mean +69% 1. 5 1 0. 5 SPEC 2006 31
2. 5 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio rijndael say susan toast untoast 401. bzip 2 410. bwaves 429. mcf 433. milc 436. cactus. ADM 437. leslie 3 d 444. namd 447. deal. II 456. hmmer 459. Gems. FDTD 462. libquantum 464. h 264 ref 470. lbm 471. omnetpp 473. astar 998. specrand 999. specrand Relative IPC Viper is Competitive vs Unprotected Oo. O Viper 4 3. 5 3 Mean +87% Mi. Bench Mean +69% 2 1. 5 1 0. 5 SPEC 2006 32
Comparison With Other Solutions Core Bulletpro Stage. Net [Pellegrini Salvaging of Cardio Reconfig. Granularity 10] [Powell 09] [Shyam 06] Core Function al units Central Control logic Central No. C Interconnect Slowdown ~3% [Gupta 08] Functiona Pipeline l stages units Central Specialize Stageed d specific ~5% ~18% ~20% Low fault rate Viper Arbitrary Decoupled Loose ~24% High fault rate 33
140 120 100 80 60 40 20 0 Stage. Net CMP Bulletproof Viper 10 0 19 0 28 0 37 0 46 0 55 0 64 0 73 0 82 0 91 0 10 00 10 Max IPC Performance degradation Faults n n CMP tile w/ 2 Billion transistors Much more graceful performance degradation 34
Conclusions n Viper: scalable, service-oriented µ-architecture n Hardware reconfiguration granularity as a design choice n Much more graceful performance degradation q Can exploit available hardware to improve performance 35
36
Service-Oriented µ-Architecture Micro. Processor 37
Viper Enables Reliable Oo. O Execution 38
Viper is Competitive vs Unprotected Oo. O 39
- Slides: 39