DRAM Tutorial 18 447 Lecture Vivek Seshadri DRAM

  • Slides: 30
Download presentation
DRAM Tutorial 18 -447 Lecture Vivek Seshadri

DRAM Tutorial 18 -447 Lecture Vivek Seshadri

DRAM Module and Chip Vivek Seshadri – Thesis Proposal 2

DRAM Module and Chip Vivek Seshadri – Thesis Proposal 2

Goals • • • Cost Latency Bandwidth Parallelism Power Energy Vivek Seshadri – Thesis

Goals • • • Cost Latency Bandwidth Parallelism Power Energy Vivek Seshadri – Thesis Proposal 3

DRAM Chip Cell Array of Sense Amplifiers Cell Array Bank I/O 4 Vivek Seshadri

DRAM Chip Cell Array of Sense Amplifiers Cell Array Bank I/O 4 Vivek Seshadri – Thesis Proposal Row Decoder

Sense Amplifier top enable Inverter bottom Vivek Seshadri – Thesis Proposal 5

Sense Amplifier top enable Inverter bottom Vivek Seshadri – Thesis Proposal 5

Sense Amplifier – Two Stable States VDD 1 0 Logical “ 1” Vivek Seshadri

Sense Amplifier – Two Stable States VDD 1 0 Logical “ 1” Vivek Seshadri – Thesis Proposal VDD Logical “ 0” 6

Sense Amplifier Operation VTDD VT > V B 0 1 V 0 B Vivek

Sense Amplifier Operation VTDD VT > V B 0 1 V 0 B Vivek Seshadri – Thesis Proposal 7

DRAM Cell – Capacitor Empty State Logical “ 0” Fully Charged State Logical “

DRAM Cell – Capacitor Empty State Logical “ 0” Fully Charged State Logical “ 1” 1 Small – Cannot drive circuits 2 Reading destroys the state Vivek Seshadri – Thesis Proposal 8

Capacitor to Sense Amplifier VDD 0 1 1 VDD Vivek Seshadri – Thesis Proposal

Capacitor to Sense Amplifier VDD 0 1 1 VDD Vivek Seshadri – Thesis Proposal 0 9

DRAM Cell Operation ½VVDD DD+δ 1 0 0 DD ½V Vivek Seshadri – Thesis

DRAM Cell Operation ½VVDD DD+δ 1 0 0 DD ½V Vivek Seshadri – Thesis Proposal 10

DRAM Subarray – Building Block for DRAM Chip Row Decoder Cell Array of Sense

DRAM Subarray – Building Block for DRAM Chip Row Decoder Cell Array of Sense Amplifiers (Row Buffer) 8 Kb Cell Array Vivek Seshadri – Thesis Proposal 11

Row Decoder Address DRAM Bank Cell Array of Sense Amplifiers (8 Kb) Cell Array

Row Decoder Address DRAM Bank Cell Array of Sense Amplifiers (8 Kb) Cell Array of Sense Amplifiers Cell Array Bank I/O (64 b) Address Vivek Seshadri – Thesis Proposal Data 12

Cell Array of Sense Amplifiers Cell Array Bank I/O Cell Array Array of Sense

Cell Array of Sense Amplifiers Cell Array Bank I/O Cell Array Array of Sense Amplifiers Cell Array of Sense Amplifiers Bank I/O Row Decoder Bank I/O Cell Array Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Row Decoder Cell Array of Sense Amplifiers Cell Array of Sense Amplifiers Bank I/O Cell Array of Sense Amplifiers Bank I/O Row Decoder Cell Array Cell Array Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Row Decoder 13 Vivek Seshadri – Thesis Proposal Row Decoder Array of Sense Amplifiers DRAM Chip Shared internal bus Memory channel - 8 bits Row Decoder

1 ACTIVATE Row Decoder Row Address DRAM Operation 2 READ/WRITE Column Cell Array of

1 ACTIVATE Row Decoder Row Address DRAM Operation 2 READ/WRITE Column Cell Array of Sense Amplifiers Cell Array 3 PRECHARGE Bank I/O Column Address Vivek Seshadri – Thesis Proposal Data 14

Row. Clone Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Vivek Seshadri Y.

Row. Clone Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Vivek Seshadri Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, T. C. Mowry

Memory Channel – Bottleneck Core MC High Energy Vivek Seshadri – Thesis Proposal Channel

Memory Channel – Bottleneck Core MC High Energy Vivek Seshadri – Thesis Proposal Channel Memory Core Cache Limited Bandwidth

Core MC Channel Reduce unnecessary data movement Vivek Seshadri – Thesis Proposal Memory Core

Core MC Channel Reduce unnecessary data movement Vivek Seshadri – Thesis Proposal Memory Core Cache Goal: Reduce Memory Bandwidth Demand

Bulk Data Copy and Initialization Bulk Data Copy src dst Bulk Data Initialization val

Bulk Data Copy and Initialization Bulk Data Copy src dst Bulk Data Initialization val dst Vivek Seshadri – Thesis Proposal

Bulk Data Copy and Initialization Bulk Data Copy src dst Bulk Data Initialization val

Bulk Data Copy and Initialization Bulk Data Copy src dst Bulk Data Initialization val dst Vivek Seshadri – Thesis Proposal

Bulk Copy and Initialization – Applications 00000 Forking Zero initialization (e. g. , security)

Bulk Copy and Initialization – Applications 00000 Forking Zero initialization (e. g. , security) Checkpointing Many more VM Cloning Deduplication Vivek Seshadri – Thesis Proposal Page Migration

Shortcomings of Existing Approach High Energy Core Cache (3600 n. J to copy 4

Shortcomings of Existing Approach High Energy Core Cache (3600 n. J to copy 4 KB) MC Channel High latency (1046 ns to copy 4 KB) Interference Vivek Seshadri – Thesis Proposal dst src

Our Approach: In-DRAM Copy with Low Cost X Core Cache High Energy MC Channel

Our Approach: In-DRAM Copy with Low Cost X Core Cache High Energy MC Channel X Interference X High latency Vivek Seshadri – Thesis Proposal dst ? src

Row. Clone: In-DRAM Copy 23

Row. Clone: In-DRAM Copy 23

Two Key Observations Row Decoder Many DRAM cells 2 share the same sense amplifier

Two Key Observations Row Decoder Many DRAM cells 2 share the same sense amplifier 1 Any operation on one sense amplifier can be easily performed in bulk Vivek Seshadri – Thesis Proposal 24

Bulk Copy in DRAM – Row. Clone ½VVDD DD +δ Data gets copied 1

Bulk Copy in DRAM – Row. Clone ½VVDD DD +δ Data gets copied 1 0 ½V 0 DD Vivek Seshadri – Thesis Proposal 25

Fast Parallel Mode – Benefits Bulk Data Copy (4 KB across a module) Latency

Fast Parallel Mode – Benefits Bulk Data Copy (4 KB across a module) Latency 11 X Energy 1046 ns to 90 ns 74 X 3600 n. J to 40 n. J No bandwidth consumption Very little changes to the DRAM chip Vivek Seshadri – Thesis Proposal 26

Fast Parallel Mode – Constraints • Location constraint – Source and destination in same

Fast Parallel Mode – Constraints • Location constraint – Source and destination in same subarray • Size constraint – Entire row gets copied (no partial copy) 1 Can still accelerate many existing primitives (copy-on-write, bulk zeroing) 2 Alternate mechanism to copy data across banks (pipelined serial mode – lower benefits than Fast Parallel) Vivek Seshadri – Thesis Proposal 27

End-to-end System Design • Software interface – memcpy and meminit instructions • Managing cache

End-to-end System Design • Software interface – memcpy and meminit instructions • Managing cache coherence – Use existing DMA support! • Maximizing use of Fast Parallel Mode – Smart OS page allocation Vivek Seshadri – Thesis Proposal 28

Applications Summary Fraction of Memory Traffic Zero Copy Write Read 1 0. 8 0.

Applications Summary Fraction of Memory Traffic Zero Copy Write Read 1 0. 8 0. 6 0. 4 0. 2 0 bootup compile forkbench mcached Vivek Seshadri – Thesis Proposal mysql shell 29

Results Summary Compared to Baseline IPC Improvement Memory Energy Reduction 70% 60% 50% 40%

Results Summary Compared to Baseline IPC Improvement Memory Energy Reduction 70% 60% 50% 40% 30% 20% 10% 0% bootup compile forkbench mcached Vivek Seshadri – Thesis Proposal mysql shell 30