Fast Block Copy in DRAM w Motivation n
Fast Block Copy in DRAM w Motivation n Exploit the wide bandwidth within DRAM chip w Idea n Use DRAM refresh period to do block copy w Methodology n n Add logic in DRAM Extend the ISA of simulator w Conclusion n For 15740 The improvement depends on the system’s memory behavior Fast Block Copy in Dram 1
Three Block Copy Modes source • Aligned Row Copy For 15740 destination • Unaligned Row Copy Fast Block Copy in Dram • Subrow Copy 2
DRAM Block Diagram - Aligned Row Copy For 15740 Fast Block Copy in Dram 3
Simulation w Simulator - Simple. Scalar 2. 0 w Add new instruction – blkcp DEFINST(BLKCP, 0 x 2 e, "blkcp", "t, o(b)", Wr. Port, F_MEM|F_LOAD|F_STORE|F_DISP, DCGPR(BS), DNA, DGPR(RT), DGPR(BS), DNA, ({int index; for (index=0; index<1024; index++) WRITE_BYTE(READ_SIGNED_BYTE(GPR(BS)+OFS+index), GPR(RT)+index); })) w Rewrite memcpy() and bcopy() using blkcp w Rebuild benchmarks using new library routines For 15740 Fast Block Copy in Dram 4
Experiment 1 – Mem Copy The ideal effect of block copy on memory system (Aligned row copy, x-axis: block size in blkcp) Best improvement achieved at medial block sizes. For 15740 Fast Block Copy in Dram 5
Experiment 2 – File Read Performance improvement on file system • Unaligned (Fig. a): • The same effect as block copy intensive system • Aligned (Fig. b): • Best improvement achieved when block size = 4 B & 8 B • Caused by different alignment of fread()’s internal buffer For 15740 Fast Block Copy in Dram 6
Experiment 3 – Perl (SPECint 95) Triangle points represent simulation errors # of memcpy() wrt different block sizes Limited performance improvement: caused by limited memory block copy For 15740 Fast Block Copy in Dram 7
Conclusion w Good for block copy intensive system w Limitations n n n Do not support multiple memory banks Hardware cost and overhead Only user mode behavior, only two routines changed w Future work n n n For 15740 Simulate both kernel and user modes Use more realistic benchmarks Compare with data prefetching and non-blocking cache Fast Block Copy in Dram 8
- Slides: 8