12 Amirkabir University of Technology Computer Engineering Information
ﺳﺎﺯﻣﺎﻥ ﺣﺎﻓﻈﻪ کﺘﺎﺏ ﻣﺎﻧﻮ 12 & ﻓﺼﻞ Amirkabir University of Technology Computer Engineering & Information Technology Department
ﺑﻠﻮک ﺩیﺎگﺮﺍﻡ ﺣﺎﻓﻈﻪ K-bit address lines N N-bit Data Input (for Write) ﻭﺍﺣﺪ ﺣﺎﻓﻈﻪ K Read/Write Chip Enable 2 k words N-bit per word N l N-bit Data Output (for Read) Example: 2 MB memory, byte-addressable l l N = 8 (because of byte-addressability) K = 21 (1 word = 8 -bit)
ﺑﻠﻮک ﺩیﺎگﺮﺍﻡ ﺣﺎﻓﻈﻪ Chip Select 1 Chip Seleect 2 Read Write 7 -bit address cs 1 cs 2 RD WR AD 7 8 -bit data bus 128 X 8 RAM
ﺍﺭﺗﺒﺎﻁ ﺣﺎﻓﻈﻪ ﺑﺎ پﺮﺩﺍﺯﻧﺪﻩ CPU chip register file ALU memory bus interface Memory Interface main memory Modules
RAM ﺣﺎﻓﻈﻪ ﺳﺎﺧﺘﻪ RAM ﻋﻤﺪﻩ ﺣﺎﻓﻈﻪ ﺍﺻﻠی کﺎﻣپیﻮﺗﺮ ﺍﺯ ﺣﺎﻓﻈﻪ . ﻣیﺸﻮﺩ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ RAM ﻣﻌﻤﻮﻻ ﺩﺭ کﺎﻣپیﻮﺗﺮﻫﺎ ﺩﻭ ﻧﻮﻉ ﺣﺎﻓﻈﻪ : ﻫﺴﺘﻨﺪ l DRAM: Dynamic Random Access Memory l l l High density, low power, cheap, slow Dynamic: need to be “refreshed” regularly SRAM: Static Random Access Memory l l Low density, high power, expensive, fast Static: content will last “forever” (until lose power) l l
DRAM ﺳﺎﺧﺘﺎﺭ ﺣﺎﻓﻈﻪ row select • Write: – 1. Drive bit line – 2. Select row • Read: – 1. Precharge bit line to Vdd/2 bit – 2. Select row – 3. Cell and bit line share charges • Very small voltage changes on the bit line – 4. Sense (fancy sense amp) • Can detect changes of ~106 electrons – 5. Write: restore the value • Refresh – 1. Just do a dummy read to every cell.
SRAM ﺳﺎﺧﺘﺎﺭ ﺣﺎﻓﻈﻪ ﻣﻌﻤﻮﻻ یک ﺳﻠﻮﻝ ﺣﺎﻓﻈﻪ ﺑﺎ ﺗﺮﺍﻧﺰیﺴﺘﻮﺭ ﺳﺎﺧﺘﻪ 6 ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯ . ﻣیﺸﻮﺩ word (row select) 6 -Transistor SRAM Cell bit • Write: bit • Read: 1. Drive bit lines (bit=1, bit=0) 1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal! 2. Select row 3. Cell pulls one line low 4. Sense amplifier on column detects difference between bit and bit
ﺗﻮﺻیﻒ ﺣﺎﻓﻈﻪ : ﻇﺮﻓیﺖ یک ﺣﺎﻓﻈﻪ ﺑﺼﻮﺭﺕ ﺯیﺮ ﺑیﺎﻥ ﻣیﺸﻮﺩ # addresses x Word size l : ﻣﺜﺎﻝ Memory # of addr # of data lines # of addr lines l l # of total bytes 1 M x 8 1, 048, 576 8 20 1 MB 2 M x 4 2, 097, 152 4 21 1 MB 1 K x 4 1024 4 10 512 B 4 M x 32 4, 194, 304 32 22 16 MB 16 K x 64 16, 384 64 14 128 KB
ﻧﺤﻮﻩ آﺪﺭﺱ ﺩﻫی ﺣﺎﻓﻈﻪ 4 x 8 Memory 2 -to-4 0 Decoder A 0 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 2 A 1 3 CS Chip Select D 7 D 6 D 5 D 4 D 3 D 2 D 1 D 0
ﻧﺤﻮﻩ آﺪﺭﺱ ﺩﻫی ﺣﺎﻓﻈﻪ 4 x 8 Memory 2 -to-4 0 Decoder A 0=1 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 2 A 1=0 3 CS Chip Select=1 Access address = 0 x 1 D 7 D 6 D 5 D 4 D 3 D 2 D 1 D 0 ﺣﺎﻓﻈﻪ 1 ﺍﻧﺘﺨﺎﺏ ﻣﺤﻞ ﺷﻤﺎﺭﻩ
Use 2 Decoders 8 x 4 Memory 2 -to-4 A 1 Decoder 1 -bit 1 -bit 1 Row Decoder 2 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit A 2 3 CS Chip Select 0 0 1 CS 1 -to-2 Decoder Column Decoder A 0 D 1 D 2 D 3 Tristate Buffer (read)
Read/Write Memory 8 x 4 Memory 0 A 1 2 -to-4 1 Row Decoder 2 A 2 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 3 CS Rd/Wr = 0 Chip Select = 0 CS 0 1 1 -to-2 Column Decoder A 0 D 1 D 2 D 3
Read/Write Memory 8 x 4 Memory 0 A 1 2 -to-4 1 Row Decoder 2 A 2 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 1 -bit 3 CS Rd/Wr = 1 Chip Select = 1 CS 0 1 1 -to-2 Column Decoder A 0 D 1 D 2 D 3
ﻣﺪﻝ ﺣﺎﻓﻈﻪ . ﻣﺤﻞ ﺣﺎﻓﻈﻪ ﺭﺍ آﺪﺭﺱ ﺩﻫی کﺮﺩ 4 GB (232 ﺑیﺖ آﺪﺭﺱ ﻣیﺘﻮﺍﻥ ﺗﺎ 32 ﺑﺎ ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯ 0 x 0000 0 x 0 A 0 x 00000001 0 x. B 6 0 x 00000002 0 x 41 0 x 00000003 0 x. FC 0 x. FFFF 0 x 0 D Lower Memory Address Higher Memory Address Flat Memory Model l
32 x 8 ROM 5 A 4 A 3 A 2 A 1 5 -to-32 32 x 8 ROM 8 Each represents 32 wires 0 1 2 3 Decoder A 0 Fuse can be implemented as a diode or a pass transistor 28 29 30 31 D 7 D 6 D 5 D 4 D 3 D 2 D 1 D 0
Programming the 32 x 8 ROM A 4 A 3 A 2 A 1 A 0 D 7 D 6 D 5 D 4 D 3 D 2 D 1 D 0 0 0 1 1 0 0 0 1 0 1 0 1 1 0 0 … … … … 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 1 5 -to-32 01 1 2 Decoder 29 30 31 D 7 D 6 D 5 D 4 D 3 D 2 D 1 D 0
کﺎﺭﺑﺮﺩی ﺍﺯ ROM ﻣﺜﺎﻝ Lookup Table: l ﻃﺮﺍﺣی ﺟﺪﻭﻟی ﺑﺮﺍی ﻣﺤﺎﺳﺒﻪ ﺗﺎﺑﻊ F(X) = X 2 F(X)=X 2 X 000000 0 0 000001 1 1 000100 010 4 2 001001 011 9 3 010000 16 4 011001 101 25 5 100100 110 36 6 110001 111 49 7
Square Lookup Table using ROM 0 1 X 2 X F(X)=X 2 000000 001 000001 010 000100 011 001001 100 010000 101 011001 110 100100 111 110001 X 0 3 -to-8 2 3 Decoder 4 5 6 7 F 5 F 4 F 3 F 2 F 1 F 0
Square Lookup Table using ROM 0 1 X 2 X F(X)=X 2 000000 001 000001 010 000100 011 001001 100 010000 101 011001 110 100100 111 110001 X 0 3 -to-8 2 3 Decoder 4 5 6 7 F 5 F 4 F 3 F 2 F 1 F 0 Not Used = X 0
Square Lookup Table using ROM 0 1 X 2 X F(X)=X 2 000000 001 000001 010 000100 011 001001 100 010000 101 011001 110 100100 111 110001 X 0 3 -to-8 2 3 Decoder 4 5 6 7 F 5 F 4 F 3 F 2 F 1 F 0
ﻣﺜﺎﻟی ﺍﺯ ﺳﻠﺴﻠﻪ ﻣﺮﺍﺗﺐ ﺣﺎﻓﻈﻪ Smaller, faster, and costlier (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices L 5: L 0: registers CPU registers hold words retrieved from L 1 cache. L 1: on-chip L 1 cache (SRAM) L 2: L 3: L 4: off-chip L 2 cache (SRAM) L 1 cache holds cache lines retrieved from the L 2 cache memory. main memory (DRAM) local secondary storage (local disks) remote secondary storage (distributed file systems, Web servers) L 2 cache holds cache lines retrieved from main memory. Main memory holds disk blocks retrieved from local disks. Local disks hold files retrieved from disks on remote network servers.
Hit Rate and Miss Penalty l Hit Rate l l l Ratio of hits to attempted accesses Greater than 0. 9 essential for high performance Miss Penalty l Extra time needed to bring desired info into the cache
Cache Operation (Read) l Read Hit l l Item in cache Read Miss l Block copied from main memory to cache l l word sent to processor after copy word sent to processor as soon as available (loadthrough)
Cache Operation (Write) l Write Hit l Write-through l l cache and main memory updated simultaneously Write-back l l l only cache updated dirty (modified) bit block written back when removed from cache
Cache Operation (Write) l Write Miss l Write-through l l main memory written directly Write-back l l block first brought into cache then desired word in cache overwritten
ﻧگﺎﺷﺖ ﻣﺴﺘﻘیﻢ Direct Mapping l ﺩﺭ ﺣﺎﻓﻈﻪ cache ﻋﻼﻭﻩ ﺑﺮ ﺩﺍﺩﻩ ﺍﻃﻼﻋﺎﺕ ﻣﺮﺑﻮﻁ ﺑﻪ Tag ﻫﻢ ﺫﺧیﺮﻩ ﻣیﺸﻮﺩ. Tag Index Data 1220 00 000 6710 27 1220 2340 3450 4560 Cache 777 5670 6710 Main memory 00000 00777 01000 01777 02000 27777
ﺩﺳﺘﺮﺳی ﺑﻪ ﺩﺍﺩﻩ ﺩﺭ ﻧگﺎﺷﺖ ﻣﺴﺘﻘیﻢ Address (32 bits) 22 10 Index Tag Index 0 1 2 3. . . 1022 1023 Valid Tag Data To CPU = Hit
cache ﻧﻮﺷﺘﻦ ﺩﺍﺩﻩ ﺩﺭ l l The lowest k bits of the address specify a cache block. The upper (m - k) address bits are stored in the block’s tag field. The data from main memory is stored in the block’s data field. The valid bit is set to 1. Address (32 bits) 22 10 Index Tag Data Index 0 1 2 3. . . Valid . . . 1 Tag Data
2 -way Set-associative mapping Index Tag 000 00 1220 02 5670 02 6710 00 2340 777 Data Cache Tag Data Cache that has k blocks per set is referred to as a k-way set-associative cache
Valid Cache Tag : : Adr Tag Compare Cache Index Cache Data Cache Block 0 : : Sel 1 1 Mux 0 Sel 0 OR Hit Cache Block Cache Tag Valid : : Compare
Example: 4 -way set associative Cache
ﺳﻠﺴﻠﻪ ﻣﺮﺍﺗﺐ ﺣﺎﻓﻈﻪ ﺩﺭ پﺮﺩﺍﺯﻧﺪﻩ ﻫﺎی ﺟﺪیﺪ Intel Pentium 4, 2. 2 GHz Processor. Component Access Speed (Time for data to be returned) Size of Component Registers 1 cycle = 0. 5 nanoseconds 32 registers L 1 Cache 3 cycles = 1. 5 nanoseconds Separate Data and Instruction Caches: 8 Kbytes each L 2 Cache 20 cycles = 10 nanoseconds 256 Kbytes, 8 -way set associative L 3 Cache 30 cycles = 15 nanoseconds 512 Kbytes, 8 -way set associative Memory 400 cycles = 200 nanoseconds 16 Gigabytes
ﺳﺨﺖ ﺍﻓﺰﺍﺭ ﻣﺮﺑﻮﻃﻪ Argument Register (A) Key Register (K) match register Input read write Associative memory array and logic m words n bit per words Output N
Associate mapping CPU Address Argument Register Address Data 00 1220 02 6710 Cache
Associative Mapping l l l Main memory block can be placed into any cache block Contention occurs only when cache is full Replacement algorithm required to chose block to repalce Associative search of tags in parallel Most flexible l Costly
Instructions versus Data l l Modern system designs frequently use a pair of separate cache memories, one for storing processor instructions and another for storing the program’s data Why is this a good idea?
AMD Opteron – caches, etc
Cache memory, L 1 cache, L 2 cache This is a Pentium from 1993! Now processor cores are even smaller.
Cache memory, L 1 cache, L 2 cache
Questions?
- Slides: 72