Onchip MRAM as a HighBandwidth LowLatency Replacement for

On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen W. Keckler, and Doug Burger Computer Architecture and Technology Lab University of Texas at Austin 02/21/2003 CART 1

Motivation • Latency to off-chip memory hundreds of cycles • Off-chip memory bandwidth becoming a performance limiting factor • MRAM – Emerging memory technology with high bandwidth and low latency • Goal of our work - To determine if the performance advantage of MRAM in high performance computing is worth more investment and research 02/21/2003 CART 2

Outline • MRAM Memory Description • MRAM Memory Hierarchy • Results • Conclusions 02/21/2003 CART 3

MRAM Cell • Magnetoresistive random access memory (MRAM) uses the magnetic tunnel junction (MTJ) to store Bit Line information • MRAM cell composed of a diode and an MTJ stack Read/Write Current • MTJ stack consists of two ferromagnetic layers separated by a thin dielectric barrier • Polarization of one layer fixed, other used for information storage Diode MTJ Stack Pt Co/F Ni/F e 2 O Al e Co/F 3 Ni/F e Mn/ e Pt Fe W Word Line 02/21/2003 CART 4

MRAM Bank Design • MRAM cells located at the intersection of each word and bit line • Read – Connect current sources to bit lines and selected wordline is pulled low • Writes – Polarity of current in the bit lines decides value stored • MRAM banks accessed using vias 02/21/2003 CART 5

MRAM Bank Modeling • Modified CACTI-3. 0 to develop an area and timing tool to model MRAM banks • Independently accessible composed of subbanks • Important features – – Active area consumed Delay due to vertical wires MRAM capacity for a given die size and cell size Support for multiple layers with sharing • SIA 2001 roadmap at 90 nm technology 02/21/2003 CART 6

Chip-Level Architecture 02/21/2003 CART 7

MRAM Design Issues • Number of Banks – More banks : Low latency, higher concurrency, higher network traversal time, higher miss rates • Cache Line Size – Larger line size : More spatial locality, higher latency • Page Placement Policy – Random – Round-robin – Least loaded 02/21/2003 CART 8

Methodology • Simulated Processor – Alpha 21264 pipeline modified for 8 wide issue – 3. 8 GHz (10 FO 4 inverters per stage) • Base SDRAM System – Distributed L 2 cache • Base MRAM system – Distributed MRAM banks and reduced capacity distributed L 2 cache • Benchmarks – Memory intensive SPEC CPU 2000, Scientific, Speech 02/21/2003 CART 9

Page Placement Policy IPC for 100 banks with different page placement policies Cost. Least-Loaded = (L 2 Hit Rate * L 2 Hit Latency) + (L 2 Miss Rate * MRAM Bank Latency) + Current Network Latency to Bank 02/21/2003 CART 10

MRAM Sensitivity 20 30 40 60 MRAM Latency Sensitivity SDRAM Latency : 30 ns 02/21/2003 CART 11

Conclusions • Developed an architectural model for exploiting an emerging memory technology, MRAM • Analyzed the contribution to performance of the different components in our MRAM system • MRAM system performs 15 % than conventional SDRAM 02/21/2003 CART 12