Memory Hierarchies in Intelligent Memories EnergyPerformance Design Wei




























- Slides: 28

Memory Hierarchies in Intelligent Memories: Energy/Performance Design Wei Huang, Jose Renau, Seung-Moon Yoo and Josep Torrellas University of Illinois at Urbana-Champaign

Motivation n Advances in technology: u Processor and Memory integration u Many processors on a chip n n n 2 How to design for high performance Energy consumption is a big concern Problems in cooling system Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Goals of this work n Evaluate trade-offs in memory hierarchy u Energy consumption u Performance u Area requirements n 3 Detailed energy consumption analysis Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Findings of the Work n n 4 Modest cache size is necessary Easy modifications in memory reduce energy consumption Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

The Flex. RAM Architecture Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattanaik and Josep Torrellas - ICCD 99 5 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Chip Architecture n 64 nodes, each one includes: u 2 -issue processor @800 Mhz u 1 MByte DRAM (12 clk) u Row Buffers (6 clk) u Cache (1 clk) 6 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

How a Memory Bank Works n n n 7 4 Memory sub-banks, each 256 KBytes 5 Row Buffers, each 1 KByte 1 Data Buffer 256 bits Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Small Area Memory Banks +More Energy +Less Spatial Locality +More Localities 8 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Pipelining the requests Faster memory system without increased energy consumption 9 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks I -Less Energy and Contention +More Area 10 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks II -Less Energy and Contention +More area 11 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Terminology for Memory Systems n n n 12 Trad(i, j): Traditional S(i, j): Segmented IS(i, j): Interleaved Segmented ISP(i, j): Interleaved Segmented Pipelined i : Degree of interleaving j : Number of sub-banks per interleaving way Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy and Area Issues • More advanced configurations: • More Area • Less Energy 13 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Evaluation Environment n Fixed parameters: u 2 -issue processor @800 MHz u Prefetch u Cache, RB, Bank latencies (1, 6, 12 cycles) n Variable parameters: u Cache sizes (256 B, 1 KB, 8 KB, 16 KB) u Memory Banks: Trad(1, 4), SP(1, 4) IS(2, 4), ISP(2, 4), IS(2, 8), ISP(2, 8) 14 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Applications 15 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Performance: Memory Banks • Small performance improvement in advanced configurations with 1 KByte cache 16 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Performance: Cache Effect • Modest cache size is required for performance 17 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy-Delay Product • Big improvement in energy-delay product with more advanced memory configurations 18 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Energy-Delay Product: Cache • 8 KBytes have the best energy-delay product 19 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Conclusions n Modest size cache is enough (8 KBytes) u Improves performance u Reduces energy consumption n Segmentation S(1, 4) u Reduces n When area is available: use interleaving u IS(2, 4) 20 energy consumption increases by 14% the area Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Backup Slides

Area-Delay Product: MB • SP(1, 4) best are utilization 22 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Area-Delay Product: Cache • 8 KBytes is a sweet point for area-delay product 23 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Power Consumption: Cache • Power is a bad metric, only useful as a constraint 24 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Memory Access Timing 25 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Area Requirements 26 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Small Area Memory Banks +Energy +Spatial Locality +Localities 27 Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000

Advanced Memory Banks Less Energy and Contention More area 28 Even more area Workshop on Scalable Shared Memory Multiprocessors - June 10, 2000