PowerAware Microprocessors Emily Chan COMP 4211 Advance Computer

  • Slides: 58
Download presentation
Power-Aware Microprocessors Emily Chan COMP 4211 Advance Computer Architecture

Power-Aware Microprocessors Emily Chan COMP 4211 Advance Computer Architecture

Paper Yu Bai and R. Iris Bahar. A Dynamically Reconfigurable Mixed In. Order/Out-of-Order Issue

Paper Yu Bai and R. Iris Bahar. A Dynamically Reconfigurable Mixed In. Order/Out-of-Order Issue Queue for Power -Aware Microprocessors. 2/24/2021 COMP 4211 Advance Computer Architecture 2

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 3

WHY? 2/24/2021 COMP 4211 Advance Computer Architecture 4

WHY? 2/24/2021 COMP 4211 Advance Computer Architecture 4

WHY ? !! 2/24/2021 COMP 4211 Advance Computer Architecture 5

WHY ? !! 2/24/2021 COMP 4211 Advance Computer Architecture 5

Two Major Issues l Battery Life – Mobile phones, Laptops and any other portable

Two Major Issues l Battery Life – Mobile phones, Laptops and any other portable equipments. l Cooling Package – When Pentium “N” comes out, you may have to keep it in a freezer. 2/24/2021 COMP 4211 Advance Computer Architecture 6

What is the problem? l Different applications may vary widely in: l Degree of

What is the problem? l Different applications may vary widely in: l Degree of instruction-level parallelism (ILP) l Branch behavior l Memory access behavior Datapath resources not optimally utilized by all applications HOWEVER, Still consuming power!!!! 2/24/2021 COMP 4211 Advance Computer Architecture 7

How can we solve the problem? Golden Rule: A good design strategy should be

How can we solve the problem? Golden Rule: A good design strategy should be flexible enough to dynamically reconfigure available resources according to the program’s needs. 2/24/2021 COMP 4211 Advance Computer Architecture 8

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 9

Focus of the paper l “Reconfigurability” of the issue queue in out-of-order superscalar processors

Focus of the paper l “Reconfigurability” of the issue queue in out-of-order superscalar processors a large source of the total power dissipation l Believe it or Not: For Alpha 21264, 46% of the total power goes to the issue logic! 2/24/2021 COMP 4211 Advance Computer Architecture 10

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 11

Overview of Approaches Taken Partition issue queue into several sets (FIFOs) -- Why? l

Overview of Approaches Taken Partition issue queue into several sets (FIFOs) -- Why? l Only instructions at the head of each FIFO are visible to the request and selection / arbitration logic -- Why? l Each FIFO issues in-order though the overall issue logic is still out-of-order -- What are the benefits? l 2/24/2021 COMP 4211 Advance Computer Architecture 12

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 13

Related Work Done Hardware dynamically monitors performance disabling part of integer and/or floating point

Related Work Done Hardware dynamically monitors performance disabling part of integer and/or floating point pipelines l Varying the instruction issue width to allow disabling of a cluster of function units l Dynamically reducing the number of active entries in the instruction window l 2/24/2021 COMP 4211 Advance Computer Architecture 14

Drawbacks No way to tell whether an instruction is ready to be issued or

Drawbacks No way to tell whether an instruction is ready to be issued or not and all instructions are visible to the selection and wake up logic power inefficient l Dynamically adjusting the issue queue size narrows the scope of instructions available for exposing ILP l 2/24/2021 COMP 4211 Advance Computer Architecture 15

Palacharla’s approach Uses FIFOs as well l Simplifies wake up and selection logic which

Palacharla’s approach Uses FIFOs as well l Simplifies wake up and selection logic which puts chains of dependent instructions into FIFO buffers l Issues instructions from multiple buffers in parallel l 2/24/2021 COMP 4211 Advance Computer Architecture 16

Palacharla’s Drawbacks l Uses a single fixed-sized data structure not always beneficial for different

Palacharla’s Drawbacks l Uses a single fixed-sized data structure not always beneficial for different applications Why is data structure such an important issue? 2/24/2021 COMP 4211 Advance Computer Architecture 17

2/24/2021 COMP 4211 Advance Computer Architecture 18

2/24/2021 COMP 4211 Advance Computer Architecture 18

Performance Analysis l Use a 1 -entry FIFO configuration as a base case, on

Performance Analysis l Use a 1 -entry FIFO configuration as a base case, on average: l l l 2 -entry FIFO 3% drop 4 -entry FIFO 14% drop 8 -entry FIFO 30% drop 64 -entry (a single FIFO) 84% drop For li, performance improves up to 4 -entry FIFO avoids executing wrong path instructions effectively 2/24/2021 COMP 4211 Advance Computer Architecture 19

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 20

Implementations l Scheme # 1 Completely disable some under-utilized FIFOs in the issue queue

Implementations l Scheme # 1 Completely disable some under-utilized FIFOs in the issue queue according to feedback from performance monitor (hardware) Pro: By completely disabling a FIFO any signals associated disabled more power savings Con: Shrinking the overall size of the issue queue Limit exposure to potential ILP not suitable for Floating Point execution 2/24/2021 COMP 4211 Advance Computer Architecture 21

Implementations l Scheme # 2 l l l vary the number and size of

Implementations l Scheme # 2 l l l vary the number and size of the FIFOs simultaneously according to feedback from performance monitor size of FIFOs increases while the number of FIFOs decreases retain same number of issue queue entries at all times but the queue appears to be smaller Pro: more flexibility in exposing potential ILP Con: entries are only made invisible associated signals still enabled less power savings 2/24/2021 COMP 4211 Advance Computer Architecture 22

Implementations l When performance is suffering a large fraction of the issue queue is

Implementations l When performance is suffering a large fraction of the issue queue is turned back on (Scheme # 1) or made visible (Scheme # 2) to the request and selection logic 2/24/2021 COMP 4211 Advance Computer Architecture 23

Pipeline Organization l Up to 6 instructions each cycle 2/24/2021 COMP 4211 Advance Computer

Pipeline Organization l Up to 6 instructions each cycle 2/24/2021 COMP 4211 Advance Computer Architecture 24

Two Major Components l Issue queue l l a set of reconfigurable FIFOs insert

Two Major Components l Issue queue l l a set of reconfigurable FIFOs insert at the tail; issue from head of a FIFO only heads of FIFOs are visible Hardware performance monitors determine optimal issue queue configuration l statistics gathered over a fixed interval of cycles called a cycle window (1024 cycles) l 2/24/2021 COMP 4211 Advance Computer Architecture 25

Issue Queue Design l Scheme # 1 2/24/2021 COMP 4211 Advance Computer Architecture 26

Issue Queue Design l Scheme # 1 2/24/2021 COMP 4211 Advance Computer Architecture 26

Scheme # 1 Design When under-utilized, disable a FIFO l FIFO must be drained

Scheme # 1 Design When under-utilized, disable a FIFO l FIFO must be drained of all valid entries before being disabled l Reduces number of instructions bidding for an issue slot power saving in the wakeup and selection logic! l Not having to update the ready status of the disabled instruction entries power saving! l 2/24/2021 COMP 4211 Advance Computer Architecture 27

Issue Queue Design l Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 28

Issue Queue Design l Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 28

Scheme # 2 Design Vary size and number of FIFOs simultaneously l Assumed no

Scheme # 2 Design Vary size and number of FIFOs simultaneously l Assumed no cycle overhead in changing from one configuration to another since each instruction has a set of arbiter enable signals indicating its arbiter assignment l Arbiter signals are disabled except for heads of FIFO power saving! l Power savings only when reduced activities in the request and selection logic l 2/24/2021 COMP 4211 Advance Computer Architecture 29

Allocations of instructions into FIFOs Important that most of the ready instructions are at

Allocations of instructions into FIFOs Important that most of the ready instructions are at the heads of FIFOs use a dependency-based strategy l l Attempt to place an instruction in the same FIFO as one or both of its source dependencies 2/24/2021 COMP 4211 Advance Computer Architecture 30

Dependency-based Strategy l l If ready new empty FIFO if no empty FIFO then

Dependency-based Strategy l l If ready new empty FIFO if no empty FIFO then !!! If one pending operand steer to the same FIFO as the producer if possible if fail, try a new empty FIFO if no empty FIFO then !!!! 2/24/2021 COMP 4211 Advance Computer Architecture 31

Dependency-based Strategy l If two pending operands implement a Last Operand Predictor (LOP) to

Dependency-based Strategy l If two pending operands implement a Last Operand Predictor (LOP) to predict which of two operands will become available later try the late arrived producer first if fail, try the other producer if fail again, try a new empty FIFO if no empty FIFO then !!!! 2/24/2021 COMP 4211 Advance Computer Architecture 32

Hardware Performance Monitors At the end of each cycle window, determine which operating mode

Hardware Performance Monitors At the end of each cycle window, determine which operating mode next l A combination of different monitoring techniques used better control l 2/24/2021 COMP 4211 Advance Computer Architecture 33

Monitoring Techniques l Monitoring IPC l l low IPC disable / hide part of

Monitoring Techniques l Monitoring IPC l l low IPC disable / hide part of the issue queue and enter low-power mode (LPM) Detecting variations in IPC l 2/24/2021 if issue and commit rates vary significantly a high branch misprediction decrease the number of FIFOs COMP 4211 Advance Computer Architecture 34

Monitoring Techniques l Performance degradation l l drop in IPC between two cycle windows

Monitoring Techniques l Performance degradation l l drop in IPC between two cycle windows exceeds a threshold value back to higher power mode Monitoring ready instructions too many stalls increase the number of FIFOs l very little stalls decrease the number of FIFOs l 2/24/2021 COMP 4211 Advance Computer Architecture 35

Monitoring Techniques l Issue queue usage l l low occupancy reduce the number of

Monitoring Techniques l Issue queue usage l l low occupancy reduce the number of FIFOs Non-Critical Instructions if no instruction is placed behind a ready instruction by the time it is removed from the queue non-critical instruction l delaying such ready instruction won’t hurt l too many non-critical instructions reduce the number of FIFOs l 2/24/2021 COMP 4211 Advance Computer Architecture 36

Power Estimations Extrapolated from available Alpha 21264 power estimates l Different issue queue designs

Power Estimations Extrapolated from available Alpha 21264 power estimates l Different issue queue designs but both use an out-of-order issuing scheme l Assume issue logic = register file + register mapping + issue queue l Issue queue = register scoreboard + request logic + arbiters l 2/24/2021 COMP 4211 Advance Computer Architecture 37

Power Estimations l Estimates: l l arbitration logic 60% of issue queue power request

Power Estimations l Estimates: l l arbitration logic 60% of issue queue power request logic 15% of issue queue power register scoreboard and rests remaining 25% Reminder: Reduce numbers of FIFO reduce activity on the arbiter enable signals, and the request logic and signals power savings! 2/24/2021 COMP 4211 Advance Computer Architecture 38

Request Logic 2/24/2021 COMP 4211 Advance Computer Architecture 39

Request Logic 2/24/2021 COMP 4211 Advance Computer Architecture 39

Request Logic Only request lines of heads of FIFOs are enabled be precharged! l

Request Logic Only request lines of heads of FIFOs are enabled be precharged! l Use the FIFO_head signal to achieve this l REQ_L asserted iff FIFO_head asserted l Conventional out-of-order issue queue: precharges every request lines each cycle! l Execution assignment info (state_cond and Ex_cond) updated no matter what save power only by completely disabling the FIFO (Scheme # 1) l 2/24/2021 COMP 4211 Advance Computer Architecture 40

Arbitration Logic Precharge only the grant lines of heads of FIFO l Assume power

Arbitration Logic Precharge only the grant lines of heads of FIFO l Assume power used in arbitration logic is directly proportional to the number of active FIFOs save more power by disabling all the grant lines associated with the unused issue slots l 2/24/2021 COMP 4211 Advance Computer Architecture 41

Register Scoreboard Logic Track data dependencies among instructions in the issue queue l Necessary

Register Scoreboard Logic Track data dependencies among instructions in the issue queue l Necessary to update information for each issue queue entries unless a FIFO is completely disabled only Scheme # 1 can achieve power saving l 2/24/2021 COMP 4211 Advance Computer Architecture 42

Experimental Methodology Uses SIMPLESCALAR l Original Register Update Unit (RUU) = instruction window +

Experimental Methodology Uses SIMPLESCALAR l Original Register Update Unit (RUU) = instruction window + array of reservation stations + reorder buffer (ROB) l RUU spilt into ROB and issue queue (IQ) more accurate modeling of current and next generation processors l ROB order instructions according to their input dependencies before entering the queue l 2/24/2021 COMP 4211 Advance Computer Architecture 43

Complete Configuration 2/24/2021 COMP 4211 Advance Computer Architecture 44

Complete Configuration 2/24/2021 COMP 4211 Advance Computer Architecture 44

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 45

Specific Monitor Technique for Scheme # 1 l Disable one FIFO when either (ordered

Specific Monitor Technique for Scheme # 1 l Disable one FIFO when either (ordered according to relative importance): less than ¼ of ready instructions are stalled; l less than 2/3 of the FIFOs are actually used on average; l more than 15% of dispatched instructions are non-critical; l current IQ occupancy rate is less than ¼ of the average occupancy rate l 2/24/2021 COMP 4211 Advance Computer Architecture 46

Specific Monitor Technique for Scheme # 1 l Enable one FIFO when either (ordered

Specific Monitor Technique for Scheme # 1 l Enable one FIFO when either (ordered according to relative importance): current issue rate (IPCissue) drops by more than 10% compared to the last cycle window executed in FPM; l current IPCissue drops by more than 15% compared to the previous cycle window; l more than 1/3 of ready instructions are stalled l 2/24/2021 COMP 4211 Advance Computer Architecture 47

Results for Scheme # 1 2/24/2021 COMP 4211 Advance Computer Architecture 48

Results for Scheme # 1 2/24/2021 COMP 4211 Advance Computer Architecture 48

Comments on Scheme # 1 l l Only applied to integer benchmarks Reasonable job

Comments on Scheme # 1 l l Only applied to integer benchmarks Reasonable job dynamically changing the 16 4 entry FIFOs But not as good for the non-FIFO (64 1 -entry) scheme; but still for compress 75% power saving with only 3. 6% drop in performance Average best cases: l l 2/24/2021 16 4 -entry FIFOs 27. 6% power saving with 3. 7% drop in performance 64 1 -entry FIFOs 64. 1% power saving but 4. 7% drop in performance (not as impressing) COMP 4211 Advance Computer Architecture 49

Specific Monitor Techniques for Scheme # 2 l Halves the number of FIFOs &

Specific Monitor Techniques for Scheme # 2 l Halves the number of FIFOs & doubles the size of each FIFO when either (ordered according to relative importance) : l l l 2/24/2021 (IPCissue – IPCcommit) > 1. 0; less than 3% of ready instructions are stalled; IPCissue < 2. 7 (threshold lowered by 0. 2 for each successive reduction in number of FIFOs); current IQ occupancy rate < 20% of average; (AVG_IPCissue – IPCissue) > 0. 15 (threshold increased by 0. 15 for each successive reduction in number of FIFOs) COMP 4211 Advance Computer Architecture 50

Specific Monitor Techniques for Scheme # 2 l Double number of FIFOs and halves

Specific Monitor Techniques for Scheme # 2 l Double number of FIFOs and halves size of each FIFO when either (ordered according to relative importance): current IPCissue drops by > 8% compared to the last cycle window l current IPCissue drops by > 6% compared to the last cycle window in FPM l more than 15% of ready instructions are stalled l 2/24/2021 COMP 4211 Advance Computer Architecture 51

FIFO usage for Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 52

FIFO usage for Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 52

Comments on FIFO usage For several FP benchmarks (applu, apsi, mgrid and swim), can’t

Comments on FIFO usage For several FP benchmarks (applu, apsi, mgrid and swim), can’t reduce number of FIFOs need more flexibility in reordering instructions l For most Integer benchmarks cut the FIFOs at least in half for a significant portion of the running time l 2/24/2021 COMP 4211 Advance Computer Architecture 53

Results for Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 54

Results for Scheme # 2 2/24/2021 COMP 4211 Advance Computer Architecture 54

Comments on Scheme # 2 Easier to cut number of FIFOs for integer benchmarks

Comments on Scheme # 2 Easier to cut number of FIFOs for integer benchmarks save at least 30% of the issue queue power l Most FP benchmarks need 64 FIFOs for a large % of running time but Scheme # 2 works reasonably well (fppp, hydro 2 and su 2 cor) l Average: 27. 3% power saving with only 2. 7% drop in performance l 2/24/2021 COMP 4211 Advance Computer Architecture 55

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related

Outline Introduction l Focus of the paper l Overview of Approaches Taken l Related Work Done l Implementations l Experimental Results l Conclusion l 2/24/2021 COMP 4211 Advance Computer Architecture 56

FINALLY!!!!! Programs vary in ILP l Dynamically reconfigure issue queue to save power l

FINALLY!!!!! Programs vary in ILP l Dynamically reconfigure issue queue to save power l Two approaches taken; Scheme # 2 works more efficiently l THANK YOU & BYE-BYE !!!!!! l Oops. . ONE LAST THING…. . l 2/24/2021 COMP 4211 Advance Computer Architecture 57

References l Yu Bai and R. Iris Bahar. A Dynamically Reconfigurable Mixed In-Order/Out-of-Order Issue

References l Yu Bai and R. Iris Bahar. A Dynamically Reconfigurable Mixed In-Order/Out-of-Order Issue Queue for Power-Aware Microprocessors. l James A. Farrell and Timothy C. Fischer. Issue Logic for a 600 -MHz Out-of-Order Execution Microprocessor. l J. E. Smith. Advanced Computer Architecture 1 “Power Efficient Architecture” Lecture Notes. l K. Wilcox and S. Manne. Alpha processors: A history of power issues and a look to the future. 2/24/2021 COMP 4211 Advance Computer Architecture 58