Viper Virtual Pipelines for Enhanced Reliability Andrea Pellegrini

  • Slides: 39
Download presentation
Viper: Virtual Pipelines for Enhanced Reliability Andrea Pellegrini, Joseph Greathouse, Valeria Bertacco University of

Viper: Virtual Pipelines for Enhanced Reliability Andrea Pellegrini, Joseph Greathouse, Valeria Bertacco University of Michigan, Advanced Computer Architecture Laboratory ISCA 2012

Reliability Challenges with CMOS Scaling Manufacturing defects Transient faults Age-related wearout That escape testing

Reliability Challenges with CMOS Scaling Manufacturing defects Transient faults Age-related wearout That escape testing Natural radiation, noise… Electromigration & gate-oxide breakdown Intel Cougar Point chipsets wearing out over time [Estimated cost of 700 M$ 2011] “Future technologies will make transistors less and less reliable” [Borkar, 2005] 2

Impact of Faults on Traditional CMPs Core Core Cach e 3

Impact of Faults on Traditional CMPs Core Core Cach e 3

Max IPC Faults Effects on CMP Throughput 140 120 100 80 60 40 20

Max IPC Faults Effects on CMP Throughput 140 120 100 80 60 40 20 0 Ideal reliable architecture CMP 0 n 200 400 600 Faults 800 CMP system w/ 2 billion transistors fitting 128 cores – no caches q 15 M transistors/core, similar to Intel Atom 1000 Core Core … … … 4

Limitations of Current µArchitectures Single point of failures Fetch Decoder Integer ALU L S

Limitations of Current µArchitectures Single point of failures Fetch Decoder Integer ALU L S Q Back End Floating Point 5

Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Fetch Decoder

Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Fetch Decoder Integer ALU L S Q Back End Floating Point 6

Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Centralized control

Limitations of Current µArchitectures Single point of failures Rigidly connected pipeline stages Centralized control logic Fetch Decoder Integer ALU L S Q Back End Floating Point 7

Can We Overcome These Single point of failures Limitations? ü Arrays of redundant hardware

Can We Overcome These Single point of failures Limitations? ü Arrays of redundant hardware units Rigidly connected pipeline stages ü Loosely connected hardware modules Centralized control logic ü Decentralized and redundant controls Service-oriented µ-architecture to tackle all three issues: Virtual Pipelines for Enhanced Reliability 8

Service-Oriented µ-Architecture Renew Driving License: q 1. Check in q 2. Vision test q

Service-Oriented µ-Architecture Renew Driving License: q 1. Check in q 2. Vision test q 3. Take picture q 4. Pay fee q 5. Get license 9

Service-Oriented µ-Architecture 10

Service-Oriented µ-Architecture 10

Service-Oriented µ-Architecture 11

Service-Oriented µ-Architecture 11

Viper - Overview n HW units can perform services for instructions n Bundles are

Viper - Overview n HW units can perform services for instructions n Bundles are instruction sequences terminating with a control instruction (JMP) 4013 c 3: add %al, [%ebx] 4013 c 5: div cl 4013 c 8: jmp 40140 a n A Virtual Pipeline is the ordered sequence of HW units that can complete the instructions in a bundle n Bundle Scheduling Unit allows instructions to use and be scheduled on the available HW units BSU BSU n An ISA is defined as the set of services needed by its instructions 12

Viper Hardware Organization - - - BSU BSU Sea of redundant HW modules Single

Viper Hardware Organization - - - BSU BSU Sea of redundant HW modules Single point of $ failure Homogenous module interconnect Rigidly connected hardware Bundle Scheduling Units modules (BSU) to schedule HW BSU BSU modules Centralized control logic BSU BSU BSU Crossbar BSU BSU R F BSU BSU BSU 13

Viper’s Execution Model 1. 2. 3. 4. Building Virtual Pipelines Inter-Module Data Dependencies Handling

Viper’s Execution Model 1. 2. 3. 4. Building Virtual Pipelines Inter-Module Data Dependencies Handling Program Mispredictions Handling Precise Exceptions 14

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC 1 4013 c 3 NPC Fetc h Re g Exe c WB F 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 15

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC 1 4013 c 3 NPC Fetc h Re g Exe c WB F 0 R 1 E 0 W 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 16

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 17

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 18

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 4013 c 3: add %al, [%ebx] 1. Building Virtual Pipelines 4013 c 8: or %al, %bl BSU – Bundle Scheduling Unit BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 F 1 R 0 E 1 W 1 3 BSU BSU Fetch instruction Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 19

Viper’s Distributed Control Logic n n HW units can negotiate their services with BSU

Viper’s Distributed Control Logic n n HW units can negotiate their services with BSU through: q Queues q Proposal broadcasts q Tokens Resource starvation avoided if the oldest bundle is served first F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 20

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster Dependencies Clusters might need operands generated by others Input Tags BSU ID PC NPC Fetc h Re g Exe c WB 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 2 4013 c 8 F 1 R 0 E 1 3 4013 c 3: add %al, [%ebx] R A R B R C R D Output Tags R A R B R C R D Additional storage required W 1 (768 bits/BSU for x 86) BSU BSU 4013 c 8: or %al, %bl BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 21

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster Dependencies Clusters might need operands generated by others Input Tags Output Tags BSU ID PC NPC Fetc h Re g Exe c WB R A R B R C R D 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 1 5 6 10 13 1 5 6 10 2 4013 c 8 F 1 R 0 E 1 W 1 3 4013 c 3: add %al, [%ebx] BSU BSU 4013 c 8: or %al, %bl BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 22

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4. Precise Exceptions 2. Inter-Cluster Dependencies Clusters might need operands generated by others Input Tags Output Tags BSU ID PC NPC Fetc h Re g Exe c WB R A R B R C R D 1 4013 c 3 4013 c 8 F 0 R 1 E 0 W 0 1 5 6 10 13 5 6 10 2 4013 c 8 F 1 R 0 E 1 W 1 13 5 6 10 3 4013 c 3: add %al, [%ebx] Tag creation is serialized BSU BSU F 0 F 1 0 1 BSU BSU Possible optimization: Tags based bundle ID R 0 R 1 Tag 13 • Does not require serialization E 0 E 1 4013 c 8: or %al, %bl • Much smaller storage needed W Value W BSU BSU BSU 23

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 c 3 F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 24

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 eb 4013 c 3 F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU Branch mispredicted! Fetch instruction Correct NPC: 4013 eb Register access Execute Write back/Commit BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 25

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013

1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 0: jmp 4013 eb 4. Precise Exceptions 3. Handling Mispredictions 4013 c 3: add %al, [%ebx] 4013 c 8: or %al, %bl BSU ID Nex t PC NPC Fetc h Reg Exe c W B 0 1 4013 c 0 4013 eb F 0 R 1 E 0 W 0 1 2 4013 c 3 4013 c 8 F 1 R 0 E 1 W 1 2 - 4013 c 8 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 26

4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c

4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 3: add %al, [%ebx] 4. Precise Exceptions 4013 c 5: div cl 4013 c 8: jmp 40140 a BSU ID Nex t PC 0 1 4013 c 3 NPC Fetc h Reg Exe c W B F 0 R 1 E 0 W 0 1 2 Exception! Division by 0 BSU BSU BSU F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 27

4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c

4. Precise Exceptions 1. Building Virtual Pipelines 2. Inter-Cluster Dependencies 3. Mispredictions 4013 c 3: add %al, [%ebx] 4. Precise Exceptions 4013 c 5: div cl 4013 c 8: jmp 40140 a BSU ID Nex t 0 1 4013 c 3 4013 c 5 F 0 R 1 E 0 F 0 W 0 E 0 R 1 W 0 E 0 W 0 1 2 4013 c 5 F 1 4013 c 5 Exc. Handler Exc. F 1 Handler R 0 E 1 F 1 W 1 E 1 R 0 W 1 E 1 W 1 2 PC PC Fetc Exe Reg Exe NPCNPC Fetch Fetc NPC Reg Re Fetc W Reg W Exe Bc h gh ch Bc Exc. Handler 4013 c 8 BSU BSU BSU W B F 0 F 1 R 0 R 1 E 0 W 0 E 1 W 1 BSU BSU BSU 28

Impact of Faults in Viper BSU BSU I$ BSU BSU BSU Crossbar BSU BSU

Impact of Faults in Viper BSU BSU I$ BSU BSU BSU Crossbar BSU BSU R F Handling runtime failures: 1. Periodic full system checkpoint 2. Detected through hardware self-tests or SW symptoms 3. If a fault is detected: a. Faulty component is diagnosed and disabled BSU BSU b. System state is restored to the previous checkpoint BSU BSU BSU c. Program execution is restarted BSU BSU 29

Experimental Setup n Viper Configuration q q q n Baseline CMP q q n

Experimental Setup n Viper Configuration q q q n Baseline CMP q q n 6 services – Fetch, Decode, Tag Generation, Execute, Commit, Write. Back 4 copy of 5 Clusters 4 cycles latency crossbar / 1 cycle cluster communication latency 4 Oo. O cores: 32 k D$ and I$ / 12 stage pipeline / 128 entry ROB / 5 RS entry per FU 6 In-order cores: 32 k D$ and I$ / 12 stage pipeline Microarch simulation q q Gem 5 / timing accurate / system emulation mode SPEC 2006 and MIBench 30

2 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio rijndael

2 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio rijndael say susan toast untoast 401. bzip 2 410. bwaves 429. mcf 433. milc 436. cactus. ADM 437. leslie 3 d 444. namd 447. deal. II 456. hmmer 459. Gems. FDTD 462. libquantum 464. h 264 ref 470. lbm 471. omnetpp 473. astar 998. specrand 999. specrand Relative IPC Viper Enables Reliable Oo. O Execution 3. 5 3 2. 5 Mean +87% Mi. Bench Mean +69% 1. 5 1 0. 5 SPEC 2006 31

2. 5 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio

2. 5 0 basicmath cjpeg crc dijkstra fft gs lout patricia qsort rawcaudio rawdaudio rijndael say susan toast untoast 401. bzip 2 410. bwaves 429. mcf 433. milc 436. cactus. ADM 437. leslie 3 d 444. namd 447. deal. II 456. hmmer 459. Gems. FDTD 462. libquantum 464. h 264 ref 470. lbm 471. omnetpp 473. astar 998. specrand 999. specrand Relative IPC Viper is Competitive vs Unprotected Oo. O Viper 4 3. 5 3 Mean +87% Mi. Bench Mean +69% 2 1. 5 1 0. 5 SPEC 2006 32

Comparison With Other Solutions Reconfig. Granularity Core Bulletpro Stage. Net [Pellegrini Salvaging of Cardio

Comparison With Other Solutions Reconfig. Granularity Core Bulletpro Stage. Net [Pellegrini Salvaging of Cardio 10] [Powell 09] [Shyam 06] Core Function al units Central Control logic Central No. C Interconnect Slowdown ~3% [Gupta 08] Functiona Pipeline l stages units Central Specialize Stageed d specific ~5% ~18% ~20% Low fault rate Viper Arbitrary Decoupled Loose ~24% High fault rate 33

Max IPC Performance degradation 140 120 100 80 60 40 20 0 Stage. Net

Max IPC Performance degradation 140 120 100 80 60 40 20 0 Stage. Net CMP Bulletproof Viper 10 100 190 280 370 460 550 640 730 820 910 1000 Faults n n CMP tile w/ 2 Billion transistors Much more graceful performance degradation 34

Conclusions n Viper: scalable, service-oriented µ-architecture n Hardware reconfiguration granularity as a design choice

Conclusions n Viper: scalable, service-oriented µ-architecture n Hardware reconfiguration granularity as a design choice n Much more graceful performance degradation q Can exploit available hardware to improve performance 35

36

36

Service-Oriented µ-Architecture Micro. Processor 37

Service-Oriented µ-Architecture Micro. Processor 37

Viper Enables Reliable Oo. O Execution 38

Viper Enables Reliable Oo. O Execution 38

Viper is Competitive vs Unprotected Oo. O 39

Viper is Competitive vs Unprotected Oo. O 39