Memory Hierarchies in Intelligent Memories EnergyPerformance Design Wei

  • Slides: 28
Download presentation
Memory Hierarchies in Intelligent Memories: Energy/Performance Design Wei Huang, Jose Renau, Seung-Moon Yoo and

Memory Hierarchies in Intelligent Memories: Energy/Performance Design Wei Huang, Jose Renau, Seung-Moon Yoo and Josep Torrellas University of Illinois at Urbana-Champaign

Motivation n Advances in technology: u Processor and Memory integration u Many processors on

Motivation n Advances in technology: u Processor and Memory integration u Many processors on a chip n n n 2 How to design for high performance Energy consumption is a big concern Problems in cooling system Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Goals of this work n Evaluate trade-offs in memory hierarchy u Energy consumption u

Goals of this work n Evaluate trade-offs in memory hierarchy u Energy consumption u Performance u Area requirements n 3 Detailed energy consumption analysis Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Findings of the Work n n 4 Modest cache size is necessary Easy modifications

Findings of the Work n n 4 Modest cache size is necessary Easy modifications in memory reduce energy consumption Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

The Flex. RAM Architecture Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge,

The Flex. RAM Architecture Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattanaik and Josep Torrellas - ICCD 99 5 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Chip Architecture n 64 nodes, each one includes: u 2 -issue processor @800 Mhz

Chip Architecture n 64 nodes, each one includes: u 2 -issue processor @800 Mhz u 1 MByte DRAM (12 clk) u Row Buffers (6 clk) u Cache (1 clk) 6 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

How a Memory Bank Works n n n 7 4 Memory sub-banks, each 256

How a Memory Bank Works n n n 7 4 Memory sub-banks, each 256 KBytes 5 Row Buffers, each 1 KByte 1 Data Buffer 256 bits Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Small Area Memory Banks +More Energy +Less Spatial Locality +More Localities 8 Workshop on

Small Area Memory Banks +More Energy +Less Spatial Locality +More Localities 8 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Pipelining the requests Faster memory system without increased energy consumption 9 Workshop on Scalable

Pipelining the requests Faster memory system without increased energy consumption 9 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks I -Less Energy and Contention +More Area 10 Workshop on Scalable

Advanced Memory Banks I -Less Energy and Contention +More Area 10 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks II -Less Energy and Contention +More area 11 Workshop on Scalable

Advanced Memory Banks II -Less Energy and Contention +More area 11 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Terminology for Memory Systems n n n 12 Trad(i, j): Traditional S(i, j): Segmented

Terminology for Memory Systems n n n 12 Trad(i, j): Traditional S(i, j): Segmented IS(i, j): Interleaved Segmented ISP(i, j): Interleaved Segmented Pipelined i : Degree of interleaving j : Number of sub-banks per interleaving way Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy and Area Issues • More advanced configurations: • More Area • Less Energy

Energy and Area Issues • More advanced configurations: • More Area • Less Energy 13 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Evaluation Environment n Fixed parameters: u 2 -issue processor @800 MHz u Prefetch u

Evaluation Environment n Fixed parameters: u 2 -issue processor @800 MHz u Prefetch u Cache, RB, Bank latencies (1, 6, 12 cycles) n Variable parameters: u Cache sizes (256 B, 1 KB, 8 KB, 16 KB) u Memory Banks: Trad(1, 4), SP(1, 4) IS(2, 4), ISP(2, 4), IS(2, 8), ISP(2, 8) 14 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Applications 15 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Applications 15 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Performance: Memory Banks • Small performance improvement in advanced configurations with 1 KByte cache

Performance: Memory Banks • Small performance improvement in advanced configurations with 1 KByte cache 16 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Performance: Cache Effect • Modest cache size is required for performance 17 Workshop on

Performance: Cache Effect • Modest cache size is required for performance 17 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy-Delay Product • Big improvement in energy-delay product with more advanced memory configurations 18

Energy-Delay Product • Big improvement in energy-delay product with more advanced memory configurations 18 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy-Delay Product: Cache • 8 KBytes have the best energy-delay product 19 Workshop on

Energy-Delay Product: Cache • 8 KBytes have the best energy-delay product 19 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Conclusions n Modest size cache is enough (8 KBytes) u Improves performance u Reduces

Conclusions n Modest size cache is enough (8 KBytes) u Improves performance u Reduces energy consumption n Segmentation S(1, 4) u Reduces n When area is available: use interleaving u IS(2, 4) 20 energy consumption increases by 14% the area Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Backup Slides

Backup Slides

Area-Delay Product: MB • SP(1, 4) best are utilization 22 Workshop on Scalable Shared

Area-Delay Product: MB • SP(1, 4) best are utilization 22 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Area-Delay Product: Cache • 8 KBytes is a sweet point for area-delay product 23

Area-Delay Product: Cache • 8 KBytes is a sweet point for area-delay product 23 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Power Consumption: Cache • Power is a bad metric, only useful as a constraint

Power Consumption: Cache • Power is a bad metric, only useful as a constraint 24 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Memory Access Timing 25 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Memory Access Timing 25 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Area Requirements 26 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Area Requirements 26 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Small Area Memory Banks +Energy +Spatial Locality +Localities 27 Workshop on Scalable Shared Memory

Small Area Memory Banks +Energy +Spatial Locality +Localities 27 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks Less Energy and Contention More area 28 Even more area Workshop

Advanced Memory Banks Less Energy and Contention More area 28 Even more area Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000