CMOS Memories Brad Quinton based on slides from
CMOS Memories Brad Quinton (based on slides from R. Saleh) Dept. of ECE University of British Columbia R. Saleh / B. Quinton 1
Overview • • • Memories are used in almost all existing chips and it represents a large part of the semiconductor market today In many applications, the most important characteristic of the memory is its price per bit. That is, the user wants to spend the least amount of money to store the required data. To reduce the cost of a bit, means making it smaller, which in turns intends to reduce the number of components it requires. Ideally, each bit should contain only one transistor. While this gives very dense storage, it leads to a number of challenging circuit design issues. In this lecture, we start with a review of the SRAM. Then we look at CAMs and DRAMs, ROMs, EPROMs, EEPROMs, and finally the dominant Flash memories. Readings: Chapters 9 of HJS (see Chapter 8 for SRAM review material) R. Saleh / B. Quinton 2
Lecture Outline 1. 2. 3. 4. 5. 6. SRAM CAM DRAM ROM EPROM/ EEPROM Flash R. Saleh / B. Quinton 3
SRAM
Applications 1. 2. 3. 4. Embedded RAM for ASICs and So. Cs Configuration Bits for FPGAs Caches in most CPUs Trace buffers in debug applications • In general SRAM is used to replace arrays of flip-flops or latches to increase storage density and save die area, as always there are trade-offs: • • Single word access - one address at a time Lower performance - not as fast as flip-flops based designs Requires BIST - there is no “scan chain” Higher leakage current R. Saleh / B. Quinton 5
High-level View: Memory R. Saleh / B. Quinton 6
Overall Structure of 64 Kb SRAM Row decoder 2 m =256 Column Pullups word line bitline 2 n =256 n=8 Address input 2 m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 7
Overall Structure of 64 Kb SRAM Row decoder 2 m =256 Column Pullups word line bitline Write 2 n =256 n=8 Address input 2 m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 8
Overall Structure of 64 Kb SRAM Row decoder 2 m =256 Column Pullups word line bitline Read 2 n =256 n=8 Address input 2 m Column Mux m=8 Column decoder Read/Write Sense en Write en Sense amplifier Read-write control Write driver Data in R. Saleh / B. Quinton Data out 9
Cross-Coupled Inverters R. Saleh / B. Quinton 10
Cross-Coupled Inverters Static State R. Saleh / B. Quinton 11
Cross-Coupled Inverters • The cross-coupled inverter structure is fairly intuitive. . . it looks a lot like a flipflop or latch • However, if you look closely the same transistors and wires are being used for both read and write operations • This makes things tricky. We could add logic to the cells to distinguish reads and writes, but that we be expensive in terms of area. • Instead, we adjust the relative sizing of the transistors to ensure that reads and writes work as expected. R. Saleh / B. Quinton 12
6 T SRAM Cell Vdd b M 5 Normal Design: M 6 M 3 b M 4 q q M 1 M 2 wordline Vdd Pull-up transistor wp 1/2 of the mirror: q Pull-down transistor R. Saleh / B. Quinton wa Access transistor q wd Vdd 13
Reading a Cell Vdd b Icell M 5 M 6 M 3 M 4 q Cbit b M 1 (=0) (1=) q M 2 Cbit wl R. Saleh / B. Quinton 14
Reading a Cell pre-charged before read Vdd b Icell M 5 M 6 M 3 M 4 q Cbit wordline goes high R. Saleh / B. Quinton b M 1 (=0) (1=) q M 2 Cbit wl 15
Reading a Cell pre-charged before read Vdd b Icell M 5 M 6 M 3 M 4 q Cbit M 1 open wordline goes high R. Saleh / B. Quinton b (=0) (1=) q M 2 Cbit closed wl 16
Reading a Cell pre-charged before read Vdd b Icell M 5 M 6 M 3 M 1 open wordline goes high wl M 4 q Cbit b wl (=0) Trigger DV b, b (1=) q D Cbit M 2 closed q q b goes low because M 1 is open R. Saleh / B. Quinton 17
Reading a Cell pre-charged before read Vdd b Icell M 5 M 6 M 3 M 1 open wordline goes high wl M 4 q Cbit b wl (=0) Trigger DV b, b (1=) q D Cbit M 2 closed q q b goes low because M 1 is open there is a risk the bit will “flip” R. Saleh / B. Quinton 18
Transistor Ratio Required for Read • To ensure that the bit does not “flip” during a read the voltage drop of q must be controlled • This can be done (see textbook) by ensuring that we have the following transistor width ratios: W 1 W 3 R. Saleh / B. Quinton 1. 5 19
Writing a Cell Vdd M 5 b M 3 R. Saleh / B. Quinton b M 4 q M 1 Vdd M 6 ( = 0) (1=) q M 2 Gnd 20
Writing a Cell Vdd M 5 b M 6 b M 4 M 3 q M 1 Vdd ( = 0) (1=) q M 2 Gnd wordline goes high R. Saleh / B. Quinton 21
Writing a Cell Vdd M 5 b M 6 b M 4 M 3 q M 1 Vdd ( = 0) (1=) q M 2 bitline is forced low Gnd wordline goes high R. Saleh / B. Quinton 22
Writing a Cell Vdd voltage drops Vdd M 5 b M 6 b M 4 M 3 q M 1 Vdd ( = 0) (1=) q M 2 bitline is forced low Gnd wordline goes high R. Saleh / B. Quinton 23
Writing a Cell Vdd voltage drops Vdd M 5 b b M 4 M 3 q M 1 Vdd wordline goes high R. Saleh / B. Quinton M 6 ( = 0) wl b (1=) q b q M 2 bitline is forced low Gnd q must be forced to cross threshold voltage 24
Transistor Ratio Required for Write • To ensure that the bit does “flip” during a write q must be pulled low • This can be done (see textbook) by ensuring that we have the following transistor width ratios: W 4 W 6 R. Saleh / B. Quinton 1. 5 25
Overall Transistor Widths • Both sides of the circuit must be balanced, therefore: W 4 = W 3 W 6 = W 5 W 2 = W 1 • So, given the minimum transistor width, we have a starting point for our design R. Saleh / B. Quinton 26
Layout of SRAM Cell Vdd b x x q x q q x x x x R. Saleh / B. Quinton x 27
CAMs
Applications • • CAMs are often used in cache memories Instead of storing/accessing data in cache through a static address, we would like to be able to store it anywhere in SRAM and then retrieve it later when needed We store the data with a keyword based on the application The lookup can be done with a tag that is matched with the keyword stored in memory and associated with the data Key design issue is to minimize the time required to access the data that matches the tag The matching is done simultaneously with all tags to reduce the read time so it consumes a lot of power! That’s why CAMs are considered to be power hungry NOTE: CAMs are really only useful if you *need* single-cycle latency, since you can always emulate a CAM with multiple stages of RAM lookups. . R. Saleh / B. Quinton 29
Associative Memory R. Saleh / B. Quinton 30
Overall Architecture of CAM Array WL Row Decoder matchline CAM Array SRAM Array (256 x 256) tagline Addr [8: 0] dummy replica row Column decode & MUX 2 CAM Write I/O & TAG drive 33 SRAM Read/Write IO 3 Data [63: 0] Tag [31: 0] valid bit R. Saleh / B. Quinton index[2: 0] 31
Overall Structure of CAM Lookup WL 1 6 T SRAM cell 6 T SRAM Bit 1 6 T SRAM cell • • • precharge Matchline 1 • • • 6 T SRAM Bit 1 WL n 6 T SRAM cell • • • precharge Matchline n Bit 1 of TAG R. Saleh / B. Quinton bit 2 of TAG bit N of TAG 32
CAM Cell WL b b Matchline 80 l M 7 M 9 M 8 R. Saleh / B. Quinton 50 l M 10 33
DRAM
Applications 1. RAM for Desktops, Laptops, Servers. etc. 2. RAM for embedded systems: routers, switches, set-top boxes, etc. • DRAMs are extremely dense (only 1 transistor/bit!) but they have a number of limitations: • • • They require a specialized fabrication process, so they can not (easily) be mixed with regular CMOS logic - DRAM will not be “embedded” in processors anytime soon. . . DRAMs bits are dynamic (i. e. the fade away. . . ) - they must be re-freshed or they become invalid! DRAM is sensitive to “soft-errors” caused by alpha particles - servers usually require ECC (Error Correcting Codes” R. Saleh / B. Quinton 35
One Transistor DRAM • • Minimal cell – Transistor is really only an access device – Storage device is a capacitor Write operation is similar to SRAM – Place data value on bitline – Raise wordline – Value on bitline is now on cell (boosted) • Issue with VT drops • Reading is more complex – Precharge bitline – Raise wordline • Charge sharing occurs – Read the voltage on bitline – Operation destroys value in cell R. Saleh / B. Quinton 36
Simple way to design capacitor: Use an extra “poly-plate” layer R. Saleh / B. Quinton 37
Area is everything for DRAM • Area is the number one concern of DRAM designers. Everyone wants more RAM is the same area. . • The number of transistors is already minimal. (You can’t get lower than one!) • The trick then is to design a smaller capacitor. . R. Saleh / B. Quinton 38
Capacitor Structures for DRAMs Trench Capacitor R. Saleh / B. Quinton Stacked Capacitor 39
DRAM Fabrication Diverges from Regular CMOS • The more tricks that you do to make smaller and better capacitors the more you are straying from “normal” CMOS • Because of this DRAMs are almost always fabricated separately, even though it would be extremely useful to integrate DRAM in an So. C. R. Saleh / B. Quinton 40
Issues in 1 -T DRAMs • Leakage – Leakage rate sets the refresh rate (there is no re-generation) • • With large memory, refresh time must be large Want refresh to take only a few percent of the access cycles – Must keep all leakage sources very small (subthreshold) Stored Charge – Want as much charge as possible – Large C and large V, need to get full Vdd into cell Readout – Is through charge sharing – Limit Cbit so it is about 10 x Ccell – Need to send small signals R. Saleh / B. Quinton 41
Reading the Cell Hard problem: • Small single-ended signal • Assume bitlines precharged to Vdd – Voltage either stays the same • Cell stored a ‘ 1’ – Or Voltage drops about 200 m. V – Precise voltage drop Ccell Cbit • Depends of Ccell/Cbit Another issue: • The value of all the cells on the wordline are destroyed • You need to read ALL the cells on the wordline on each access • Then you have to write them back into the cells R. Saleh / B. Quinton 42
Reading the Cell D (column 1) 10 C Sense D (column 1) Amp C • • • C ½C D (column 2) C • • • R 1 C ½C 10 C Sense R 128 ½C Rdummy 1 ½C Rdummy 2 C D (column 2) Amp C • • • R 129 C R 256 Single ended voltage detection is difficult, so a dummy cell is used on the opposite side of the cell being read R. Saleh / B. Quinton 43
Reading the Cell provides differential voltage to sense amp D (column 1) 10 C Sense D (column 1) Amp C • • • C ½C D (column 2) C • • • R 1 C ½C 10 C Sense R 128 ½C Rdummy 1 ½C Rdummy 2 C D (column 2) Amp C • • • R 129 C R 256 Single ended voltage detection is difficult, so a dummy cell is used on the opposite side of the cell being read. R. Saleh / B. Quinton 44
Latch-based Sense Amplifier VDD Sense. Enable M 5 M 1 M 2 D D M 3 Sense. Enable M 4 M 6 • When differential voltage develops on D and D_bar, the Sense. Enable line is turned on • This activates the pullup and pulldown • Cross-coupled inverters use regenerative behavior to restore full logic levels • These values are written back into the cell Dummy cell voltage is set to Vdd/2 R. Saleh / B. Quinton 45
ROM
Applications 1. Initial Boot Code in embedded designs 2. Alternative implementation of some transform / coding algorithms 3. Power-on Self Test code • The basic ROM structure forms the basis for EPROM and Flash. . . R. Saleh / B. Quinton 47
Read-Only Memories • store values in memory at design time COLUMN (BIT LINE) • large storage - typically 1 T/bit • operation: ROW (WORD LINE) – word line goes high –single bit line goes low or stays high depending on stored value –sense amp. reads value • design alternatives: – Nor array – Nand array – Mixed Nor/Nand R. Saleh / B. Quinton SENSE CIRCUIT • Presence of a transistor implies a stored “ 0” • Absence of a transistor implies a stored “ 1” 48
NOR Array • Columns of Nor Array form large NOR gate • Sense circuit input may be a simple pull-up or a more complicated sense amplifier • Program by making transistors stay in the off state even when word line goes high: – source or drain contact – presence or absence or diffusion region Bj W Bj + 1 Bj + 2 Bj + 3 i Wi + 1 • • • – enhancement implant i. e. , VT>5 V SENSE CIRCUIT If the transistor is present the bitline will be pulled down R. Saleh / B. Quinton 49
NAND Array VDD • Program by making transistor always on (shorted) • Can program with depletion implant (VT < 0. 0) • No ground lines through core; only poly rows and diffused columns • Densest array possible since there are no contacts • Word lines are high by default; one switches low • Can be very slow due to long series resistance paths; only useful for small memories R. Saleh / B. Quinton To reduce power COLUMN SELECT Bj Bj + 1 Bj + 2 Wi + 1 Wi W 0 50 If the transistor is absent the path to ground will not be complete
EPROM / EEPROM
Applications 1. 2. 3. 4. Initial Boot code or BIOS Configuration storage for FPGAs Software storage for embedded systems Code storage for DRM systems • EEPROM is quickly being replaced by flash in most systems since it is much more convenient. R. Saleh / B. Quinton 52
EPROM Structure and Operation 2 1 VT 0 R. Saleh / B. Quinton VT 1 53
EPROM Structure and Operation 2 1 VT 0 VT 1 This is the key to the EPROM structure. R. Saleh / B. Quinton 54
Operation of Floating Gate Device V 2 C 1 R. Saleh / B. Quinton V 1, new = V 1, old + C C 2 1 DV 2 + C 2 55
EPROM Write/Erase UV light GND Vpp GND N+ Vd e- e- e- N+ P-substrate (a) write process – hot carrier injection R. Saleh / B. Quinton GND Vs N+ N+ e. P-substrate (b) erase process – UV light 56
EPROM Write/Erase Voltage >> VDD UV light GND Vpp GND N+ Vd e- GND Vs e- e- N+ P-substrate (a) write process – hot carrier injection N+ N+ e. P-substrate (b) erase process – UV light Result is that threshold voltage is now greater than VDD R. Saleh / B. Quinton 57
EPROM Write/Erase Voltage >> VDD UV light GND Vpp GND N+ Vd e- e- e- N+ P-substrate (a) write process – hot carrier injection Result is that threshold voltage is now greater than VDD R. Saleh / B. Quinton GND Vs N+ N+ e. P-substrate (b) erase process – UV light makes the Si. O 2 slightly conductive. 58
EEPROM • What if you don’t have access the device to shine light on it? • Customer setups, remote situations, sealed units. . • What we want is to be able to erase electrically. • There is a solution: add a transistor per bit. R. Saleh / B. Quinton 59
Conventional EEPROM (FLOTOX) • • Two transistors per cell Selective erase Relatively low area efficiency Write/Erase done by Fowler-Nordheim (FN) tunneling – FLOTOX = (FLOating gate Tunneling Oxide) G GND N+ FN tunneling R. Saleh / B. Quinton Control Gate 2 Floating Gate 1 WL Select Transistor BL ee- N+ N+ p substrate 60
Write/Erase Operation 12 V 0 V G 0 G 1 0 V 0 V VDD WL 0 0 V 12 V G 0 G 1 0 V WL 1 BL 0 BL 1 BL 2 BL 3 (a) Program (lower VT of FOTOX device) R. Saleh / B. Quinton 0 V Source WL 0 0 V 12 V Source VDD 0 V 0 V BL 0 BL 1 BL 2 BL 3 (b) Erase (raise VT of whole row) 61
Read Operation VDD low VT VDD G 0 VDD G 1 high VT low VT Sour ce WL 0 0 0 • • • 0 V WL 1 Set gate control voltage to VDD on FLOTOX devices Connect source to Gnd When word line goes high, each selected cell will either pull the BL low or leave it high depending on the state of the programmed cell BL 0 BL 1 BL 2 BL 3 • R. Saleh / B. Quinton BUT: 2 T cell is too expensive in terms of area 62
Flash
Applications 1. 2. 3. 4. 5. 6. Cell Phones i. Pods USB keys Hard drives BIOS FPGA configuration bits for some specialized applications • • Flash requires only 1 transistor per bit. . . Flash supports single bit writes, with block erase. However this is not a problem in systems with other storage, since you can read the entire block before you erase it and write it back with the changes that you want. . R. Saleh / B. Quinton 64
NOR Flash Memory Architecture Source line Source switch Bit line Row de coder Ad dress Word line Gate Source Sense amplifier Column decoder N+ Drain N+ Data R. Saleh / B. Quinton 65
Write/Erase Operation Gnd Gnd Vpp Gnd Gnd Gnd Vd(w) Gnd Write: Hot-carriers R. Saleh / B. Quinton Vs Gnd Gnd Gnd Erase: FN Tunneling 66
Flash Read Operation • • • Apply Vd to selected bit line, Vread to word line, Gnd to Source Connection Write-VT > Vread > Erase – VT Sense drain current using sense amplifier Gnd Vread Gnd Vd Gnd R. Saleh / B. Quinton Vd Gnd 67
Summary • • • SRAM - Simple, static easy to use, but with relatively low density CAM - Special purpose SRAM-like configuration for latency sensitive applications DRAM - Very high-density, but complex to implement and use ROM - NOR or NAND configurations depending on size/speed EPROM, E 2 PROM- maintain state without a power source Flash - 1 T storage that maintain state without a power source, bit-wise write, block erase R. Saleh / B. Quinton 68
End.
- Slides: 69