Power Efficient Cache Coherence Craig Saldanha Mikko Lipasti
- Slides: 43
Power Efficient Cache Coherence Craig Saldanha Mikko Lipasti
Motivation n n Power consumption becoming a serious design constraint. Market demand for faster and more complex servers. Complex coherence protocol and interconnect. fclock + Complexity = Pinterconnect
Motivation n n Traditional power saving methodologies ineffective. Minimize number of transaction packets. At the end-points: Jetty. At the source: Serial Snoop.
OUTLINE n n n Overview of Snoop based protocols and opportunity for power savings Latency and Power consumption of parallel snooping techniques Serial Snooping Results Conclusions Future Work
Snoop Based Coherence n P 1 P 2 P 3 P 4 à Tag Lookup Bus Arbitration Snoop Transmission n Remote Node à à Memory Local Node Tag Array à Data Array à à Tag Lookup Snoop Response Combination of Responses Data Fetch Data Transmit
Degrees of Speculation 3 Degrees of Freedom to Speculate Snooping Data Fetch Data Transmit n Parallel Snoop, Spec Dfetch Spec DXmit D-Xmit DFetch D-Xmit Snoop Parallel Snoop, Spec Dfetch Non-Spec DXmit X Parallel Snoop, Non-Spec Dfetch Non-Spec DXmit Serial Snoop, Spec Dfetch Spec DXmit D-Xmit DFetch Serial Snoop, Spec Dfetch Non-Spec DXmit D-Xmit X Serial Snoop, Non-Spec Dfetch Non-Spec DXmit
Latency and Power assumptions n n n Consider only load misses Tree of point-point connections. Latency to traverse a link: 1 Bus cycle (7 ns) Tag Look up : 1 bus cycle (7 ns) D-Fetch: 2 bus cycles (14 ns) DRAM access: 10 bus cycles (70 ns) Backplane MEMORY Root Node Switch 2 Switch 1 P 1 Board 1 P 2 P 3 P 4
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 Power Plink
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 Power Plink+Pswitch
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 21 Power 2 Plink+Pswitch
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 21 28 Power 2 Plink+2 Pswitch
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 21 28 35 Power 5 Plink+2 Pswitch
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 21 28 35 Power 5 Plink+4 Pswitch 42
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 0 7 14 21 28 35 42 49 Power 8 Plink+4 Pswitch
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Memory Access : Data Fetch Latency 35 Power Pmem 91 105 MEMORY
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Memory Access : Data Transmit Latency 35 91 105 140 Power Pmem+3 Plink+2 Pswitch MEMORY
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node: Tag Lookup Latency 49 56 Power 3 Ptag MEMORY
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node : Snoop Response MEMORY Latency 49 56 105 Power 3 Ptag+3*(Pswitch+2 Plink)+Plink+2 Pswitch+2 Plink
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node: Data Fetch Latency 49 63 Power 3 Pcache MEMORY
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node : Data Transmit Latency 49 63 112 Power 3 Ptag+3*(4 Plink+3 Pswitch) MEMORY
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Latency TL 0 Snoop BRDCST 49 Local Node 49 49 35 TL DF 56 RSP + CMB 63 Memory Access 105 Data Xmit 91 105 112 Remote Node Data Xmit 140 Memory Power Remote Node supplies the data 29 Plink+18 Pswitch+3 Ptag+3 Pcache+Pmem Memory supplies the data 20 Plink+11 Pswitch+3 Ptag+3 Pcache+Pmem
Parallel Snoop Speculative Fetch Non. Speculative Transmit (PS/SF/NT) Latency TL 0 49 Snoop BRDCST 49 49 35 Local Node TL DF 56 RSP + CMB 63 Memory Access 105 91 105 Remote Node 154 Data Xmit 140 Memory Power Remote Node supplies the data 21 Plink+12 Pswitch+3 Ptag+Pcache+Pmem Memory supplies the data 20 Plink+11 Pswitch+3 Ptag+3 Pcache+Pmem
Parallel Snoop Non-Speculative Fetch Non -Speculative Transmit (PS/NF/NT) Latency TL 0 49 Snoop BRDCST 49 TL Local Node RSP + 105 CMB Remote Node 105 119 168 DF Data Xmit Remote Node 91 Memory Access 161 196 Data Xmit Memory Power Remote Node supplies the data 21 Plink+12 Pswitch+3 Ptag+Pcache Memory supplies the data 20 Plink+11 Pswitch+3 Ptag+Pmem
Serial Snooping n Avoids Speculative transmission of Snoop packets. MEMORY
Serial Snooping n n n Avoids Speculative transmission of Snoop packets. Check the nearest neighbor Data supplied with minimum latency and power MEMORY
Serial Snooping n Forward snoop to next level MEMORY
Serial Snooping n Forward snoop to next level MEMORY
Serial Snooping n Search other half of tree MEMORY
Serial Snooping n n Search other half of tree Search leaf nodes serially MEMORY
Serial Snooping n n Search other half of tree Search leaf nodes serially MEMORY
Serial Snooping : Features n n n Latency to satisfy a request dependent on distance from requestor. Data resident at the nearest neighbor supplied with the lowest latency and power. Requests visible to memory controller only at root node. Latency is adversely affected when requested data present at the farthest node Worst case power consumption is still less than the parallel snooping.
Serial Snooping : Request satisfied by Nearest Node Latency 0 TL MEMORY SNP 21 P 1 21 21 TL 28 DF 49 RSP 35 Data Xmit P 2 56 P 2 Power Xmit Snoop: 2 Plink + Pswitch P 2 Tag access and snoop response: Ptag + 2 Plink + Pswitch P 2 Data Fetch and Xmit: Pcache +2 Plink + Pswitch Ptotal= 6 Plink+3 Pswitch+Ptag+Pcache
Serial Snooping : Request satisfied by Next-Nearest Neighbor Latency 0 TL SNP MEMORY 21 21 P 1 28 TL 49 63 77 P 2 RSP 77 84 133 TL Xmit 77 63 91 DF P 3 140 Xmit Memory Access 133 P 3 168 Xmit Memory Power 16 Plink+10 Pswitch+2 Ptag+Pcache
Serial Snooping : Request satisfied by farthest node Latency MEMORY 0 21 P 1 TL Xmit 21 28 49 TL RSP 63 77 77 P 2 84 TL 105 112 DF 119 147 Memory Access 133 147 P 4 168 Xmit 168 133 147 Memory Access Xmit P 3 P 4 161 TL RSP 105 10 Xm Xmit Spec-Memory 182 Memory Access 217 Xmit 252 Non-Spec-Memory Power Remote Node supplies the data 18 Plink+11 Pswitch+3 Ptag+Pcache If Memory supplies the data 17 Plink+10 Pswitch+3 Ptag+Pmem
RESULTS : Load Miss Distributions
RESULTS: Average Latencies to satisfy load misses
RESULTS: Relative Power Savings
CONCLUSIONS n n n Reducing degree of speculation has potential for significant power savings Performance degradation is minimal for the set of benchmarks studied. Serial Snooping with speculative memory fetch provides optimal latency and power consumption.
Future Work n n n Develop detailed execution-driven Power Model Explore different interconnect topologies. Examine the viability of adaptive mechanisms for protocol policy.
Serial Snooping Nearest Neighbor Latency 0 7 14 21 Power 2 Plink+Pswitch MEMORY
Questions
Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) TL 0 Snoop BRDCST 35 49 Local Node Memory Access 91 MEMORY
- Mikko lipasti
- Mikko h. lipasti
- Mikko lipasti
- Mikko prii
- Mikko h. lipasti
- Ece 751
- Mikko h. lipasti
- Mikko lipasti
- Mikko h. lipasti
- Mikko lipasti
- Mikko lipasti
- Ecc syndrome
- Cache coherence protocols
- Cache coherence for gpu architectures
- Cache bellek
- Cache coherence tutorial
- Chained cache coherence protocol
- Cache coherence example
- Productively efficient vs allocatively efficient
- Allocative efficiency
- Productively efficient vs allocatively efficient
- Allocative efficiency vs productive efficiency
- Productively efficient vs allocatively efficient
- Cache lab cmu
- Lets copy saldanha
- Leicester warwick medical school
- Jacintha saldanha
- Mamut ledeno doba
- Oi
- Joga rede no mar
- Lt. leak lost battalion
- Power triangle formula
- Mikko ranta-huitti
- Mikko kesonen
- Mikko posti
- Mikko häikiö
- Joannaseppa
- Mikko juusela
- Mikko manka tampereen yliopisto
- Potentiometric accelerometer
- Mikko vienonen
- Mikko sola
- Mikko karppinen
- Mikko lappalainen