FBDIMM technology Dezs Sima Fall 2007 Ver 1
FB-DIMM technology Dezső Sima Fall 2007 (Ver. 1. 0) Sima Dezső, 2007
Motivations to introduce FB-DIMMs in servers/workstations Shortcommings of the stub-bus topology used with conventional DRAM architectures [2] Stub-bus topology Data lines of the memory controller are electrically connected to the data lines of every DRAM device on the bus (memory channel) Impedance discontinuities effect signal integrity [2] Memory channels may have 8 DIMMs with 8 DRAM devices/DIMM (i. e. 72 devices/channel) Heavy signal loading due to the large number of devices and impedance discontinuities on the bus limit the number of DRAM devices connected to the channel the more the higher the data rate
Figure: Scaling number of channels with memory hubs [7]. Two ranks of DRAM devices per DIMM is assumed. In the case of single rank per DIMM , while the number of DIMMs per channel may be doubled, the declining trend shown in the figure remains the same.
For higher DRAM speeds less DRAM devices can be connected per memory channel [2] Stub-bus channel capacity (device density x nr. of devices) has hit its ceiling [2] but increasing server performance doubles memory capacity demand about every two years [2]
from Jacob mem systems 2007
Increasing the number of memory channels Each DDR 2 memory channel requires 240 pins
FB-DIMM technology (1) Principle of operation • introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses) • introduce full buffering (registered DIMMs buffer only addresses) • CRC error checking (cyclic redundancy check)
FB-DIMM technology (2) Figure: FB-DIMM memory architecture [4]
Figure: Maximum supported FB-DIMM configuration [6] (6 channels/8 DIMMs)
FB-DIMM technology (3) Implementation details (1) • Serial transmission between the North Bridge and the DIMMs (each bit needs a pair of wires) • Number of seral links • 14 read lanes (2 wires each) • 10 write lanes (2 wires each) • Clocked at 6 x double pumped data rate e. g. for a DDR 667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz • Every 12 cycles (that is every two memory cycles) constitute a packet. • Read packets (frames, bursts): 168 bits (12 x 14 bits) • 144 data bits (equals the number of data bits produced by a 72 bit wide DDR 2 module (64 data bits + 8 ECC bits) in two memory cycles) • 24 CRC bits. • Write packets (frames, bursts): 120 bits (12 x 10 bits) • 98 payload bits • 22 CRC bits.
FB-DIMM technology (4) Implementation details (2) 98 payload bits. • 2 frame type bits, • 24 bits of command, • 72 bits for data and commands, according to the frame type, e. g. 72 bits of data, 36 bits of data + one command or two commands. Commands • row select, precharge, refresh, read, write etc. • all commands include a 3 -bit FB-DIMM module address to select one of 8 modules.
FB-DIMM technology (5) Implementation details (3) Read bandwidth: One FB-DIMM channel transfers in one frame (that is in 12 cycles): 128 data bits, + 16 ECC bits One frame lasts 2 memory cycles One DDR 2 DIMM channel transfers in 2 memory cycles: 2 x 72 bits (2 x 64 -bit data + 2 x 8 -bit ECC) The read bandwidth of an FB-DIMM channel equals the bandwidth of a DDR 2 channel Write bandwidth: The write bandwidth of an FB-DIMM channel is up to 0. 5 x the read bandwidth. But FB-DIMMs allow simultan read and write operation
FB-DIMM technology (6) FB-DIMM data puffer (Advanced Memory Buffer, AMB) Manages the read/write operations of the module Source: PC stats FB-DIMM-4300 (DDR 2 -533 SDRAM); Clock Speed: 133 MHz, Data Rate: 532 MHz, Through-put 4300 MB/s PC 2 -5300 (DDR 2 -667 SDRAM); Clock Speed: 167 MHz, Data Rate: 667 MHz, Through-put 5300 MB/s PC 2 -6400 (DDR 2 -800 SDRAM); Clock Speed: 200 MHz, Data Rate: 800 MHz, Through-put 6400 MB/s Figure: Different implementations of FB-DIMMs
Figure: Block diagram of the AMB [3] (There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)
FB-DIMM technology (7) Necessary routing to connect the north bridge to the DIMM socket b) In case of an FB-DIMM (69 pins) a) In case of a DDR 2 DIMM (240 pins) A 2 -layer PCB is needed (but a 3. layer is used for power lines) A 3 -layer PCB is needed Figure: PCB routing [4]
FB-DIMM technology (8) Figure: Latency and bandwith figures of different DRAM technologies for a mix of SPEC applications [5]
FB-DIMM technology (9) Pros and cons of FB-DIMMs Advantage of FB-DIMMs vs DDR 2 and DDR 3 DIMMs • more memory channels (up to 6) higher total bandwidth • more DIMM modules (up to 8) per channel higher memory capacity (up to 192 GB) • less wires simplified PCB routing • symultaneous read/write operation in a channel Disadvantage of FB-DIMMs vs DDR 2 and DDR 3 DIMMs • higher latency and lower bandwidth figures for 4 to 8 DIMM modules • higher cost • higher dissipation (Typical dissipation figures: DDR 2: about 5 W AMB: about 5 W DDR 2 FB-DIMM: about 10 W)
Latency The other issue is potentially more troubling. Intel addressed this by not having the signals be stored and then retransmitted. The data travels along a special fast-passthrough channel in the buffer itself. This lessens much of the latency that would be induced by store and forward architectures.
Figure: FB-DIMM heat sinks (heat spreaders)
FB-DIMM technology (10) Market penetration of the FB-DIMM technology • 5/2006 Intel adopts it in its Bensley platform (5000) for DPs • 8/2007 Sun introduces it in the Niagara II • 9/2006 AMD has taken it off from their road map • 9/2007 Intel uses it in the Caneland platform (7000) for MPs • 2007 Major memory manufacturers intend to develop DDR 3 DIMMs instead of DDR 3 based FB-DIMMs Standardisation 3/2007 JESD 205 DDR 2 SDRAM Fully Buffered DIMM (FBDIMM) Design Specification DDR 2 -533, DDR 2 -667, DDR 2 -800 x 72 ECC, 240 pin 256 Mb, 512 Mb, 1 Gb, 2 Gb, 4 Gb devices 1/2007 JESD 206 FBDi. MM Architecture and Protocol
FB-DIMM technology (11) DDR 2 vs (SDRAM) DDR The key difference between DDR and DDR 2 is that the DDR 2 data bus is clocked at twice the speed of the memory cells, so four data words can be transferred in each memory cell cycle without speeding up the memory cells themselves. Figure: Clocking schemes of the SDR, DDR and DDR 2 SDRAM techologies [1]
DDR 2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. The DDR 2 prefetch buffer is 4 bits deep, whereas it is 2 bits deep for DDR (and 8 bits deep for DDR 3). While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, early DDR 2 may have read latencies between 4 and 6 cycles. Although introduced in Q 2 2003 at 200/266 MHz, initially DDR 2 could not be competitive due to too high latency figures. As lower latency parts became available by the end of 2004 DDR 2 became widespread. Memory Timings Latency Bandwidth in dual-channel mode DDR 400 SDRAM 2. 5– 3– 3 12. 5 ns 6. 4 GB/sec DDR 400 SDRAM 2– 3– 2 10 ns 6. 4 GB/sec DDR 533 SDRAM 3– 4– 4 11. 2 ns 8. 5 GB/sec DDR 533 SDRAM 2. 5– 3– 3 9. 4 ns 8. 5 GB/sec DDR 2 -533 SDRAM 5– 5– 5 18. 8 ns 8. 5 GB/sec DDR 2 -533 SDRAM 4– 4– 4 15 ns 8. 5 GB/sec DDR 2 -533 SDRAM 3– 3– 3 11. 2 ns 8. 5 GB/sec DDR 2 -600 SDRAM 5– 5– 5 16. 6 ns 9. 6 GB/sec DDR 2 -600 SDRAM 4– 4– 4 13. 3 ns 9. 6 GB/sec Table: Burst timing, latency and bandwidth figures of DDR and DDR 2 DRAM technologies [1]
CAS latency (Column Address Select), (CL) the time delay (in number of clock cycles) between a memory chip is accessed for data and the first data bit becomes available For instance, after accessing a 400 MHz CL 3 device, the first bit arrives in 3 x 2. 5 ns = 7. 5 ns Early DDR 2 -533 SDRAM modules available at the time of the announcement of i 925 and i 915 chipsets (6/2004) had 4 -4 -4 timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time).
FB-DIMM technology () Power savings are achieved primarily due to a drop in operating voltage (1. 8 V compared to DDR's 2. 5 V). DDR 2 has 240 pins instead of 168 pins used by DDR DIMMs
DDR 3 Official JEDEC Specifications DDR 2 DDR 3 Rated Speed 400 -800 Mbps 800 -1600 Mbps Vdd/Vddq 1. 8 V +/- 0. 1 V 1. 5 V +/- 0. 075 V Internal Banks 4 8 Termination Limited All DQ signals Topology Conventional T Fly-by Driver Control OCD Calibration Self Calibration with ZQ Thermal Sensor No Yes (Optional) Source: Anandtech Appeared mid 2007 e. g. in Intel’s P 35 Bearlake Source: Wiki
5. 2. Speed gap between processor and memory (1 a) Figure 5. 1 a: DRAM types
5. 2. Speed gap between processor and memory (1 b) Figure 5. 1 b: Latency of DRAM chips
5. 2. Speed gap between processor and memory (1 c) Figure 5. 1 c: System-level memory latency in x 86 -based PCs
5. 2. Speed gap between processor and memory (1 d) Figure 5. 1 d: Latency of DRAM chips (in clock cycles)
5. 2. Speed gap between processor and memory (2) Figure 5. 2: Relative transfer rate of memories (D: dual channel)
References [1]: Gavrichenkov I. , „DDR 2 vs. DDR: Revenge Gained, ” Xbit Laboratories, 12/17/2004, http: //www. xbitlabs. com/articles/memory/display/ddr 2 -ddr. html [2]: Vogt P. , Fully Buffered DIMM (FB-DIMM) Server Memory Architecture, ”, Febr. 18, 2004, Intel Developer Forum, http: //www. idt. com/content/OSA_S 008_FB-DIMM-Arch. pdf [3]: Mc. Tague M. & David H. , „ Fully Buffered DIMM (FB-DIMM) Design Considerations, ” Febr. 18, 2004, Intel Developer Forum, http: //www. idt. com/content/OSA-S 009. pdf [4]: Haas, J. & Vogt P. , Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level, ” Technology Intel Magazine, March 2005, pp. 1 -7 [5]: Ganesh B. , Jaleel A. , Wang D. , Jacob B. , „Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling”, Proc. HPCA 2007 [6]: - „Introducing FB-DIMM Memory: Birth of Serial RAM? , ” PCStats, Dec. 23, 2005, http: //www. pcstats. com/articleview. cfm? articleid=1812&page=1 [7]: Haas J. & Vogt P. , „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level, ” Technology Intel Magazin, http: //www. intel. com/ technology/magazine/computing/fully-buffered-dimm-0305. htm
- Slides: 33