CSL 718 Memory Hierarchy Cache Memories 6 th


























- Slides: 26
CSL 718 : Memory Hierarchy Cache Memories 6 th Feb, 2006 Anshul Kumar, CSE IITD
Memory technologies • Semiconductor – – Registers SRAM DRAM FLASH Random Access • Magnetic – FDD – HDD • Optical Random + sequential – CD – DVD Anshul Kumar, CSE IITD 2
Hierarchical structure Speed CPU Size Fastest Memory Smallest Highest Biggest Lowest Cost / bit Memory Slowest Memory Anshul Kumar, CSE IITD 3
System Configuration: e-bay price: Rs. 37, 500 Processor: Intel P 4 3. 2 GHz (800 FSB) 1024 k CPU with Hyper Threading CPU Fan: P 4 Heavy Duty Cooling Fan With Heat Sink Motherboard: D 915 G express chipset 800 FSB (up to 3. 6 GHz support) Memory: 1 GB DDR 400 PC 3200 DUAL CHANNEL RAM Video Card: Ge. Force FX 6200 256 MB 16 x PCI-e video with TV out Hard drive: 160 GB 7200 RPM UDMA-150 SATA CD drive: 52 x 32 x 52 x 16 x CDRW + DVD ROM drive Floppy drive: Sony 1. 44 MB 3. 5" drive Sound: AC 97 6 ch 5. 1 Full duplex digital sound, stereo speakers Network: 10/100 RJ 45 onboard network (Ethernet, cable or DSL) Modem: 56 k v 92 modem Ports: Six USB 2. 0 ports, 1 serial, 1 parallel, 1 microphone jack Case: Black i BOX 522 Mid Tower 400 w power supply (front USB) Keyboard: Black PS 2 Windows Keyboard Mouse: Black PS 2 Scroll Mouse Anshul Kumar, CSE IITD Monitor: 17" SAMSUNG 793 S MONITOR 4
Main Memory for Pentium IV DDR (double data rate) DRAM Size Interface Price 128 MB PC-333 Rs. 599 256 MB PC-333 Rs. 1, 299 1 GB PC-333 Rs. 4, 999 1 GB PC-400 Rs, 5, 299 Anshul Kumar, CSE IITD 5
Disk drives Seagate Baracuda 7200 RPM Capacity Price 40 GB 80 GB 120 GB 160 GB 200 GB 250 GB 300 GB 400 GB Rs. 2, 999 Rs. 3, 499 Rs. 4, 799 Rs. 5, 500 Rs. 6, 999 Rs. 9, 900 Rs. 14, 950 Anshul Kumar, CSE IITD 6
Data transfer between levels hit Processor access miss Data transfer unit of transfer = block Anshul Kumar, CSE IITD 7
Principle of locality • Temporal Locality – references repeated in time • Spatial Locality – references repeated in space – Special case: Sequential Locality Anshul Kumar, CSE IITD 8
Memory Hierarchy Analysis Memory Mi: Capacity si: Unit cost ci: Total cost Ctotal: Access time ti : Hit ratios hi(si): Effective time Teff: Miss before level i, mi: Anshul Kumar, CSE IITD M 1, M 2, …. , Mn s 1< s 2< …. < sn c 1> c 2> …. > cn i ci. si 1+ 2+ …. + i ( i at level i) 1< 2< …. < n h 1< h 2< …. < hn = 1 i mi. hi. ti = i mi. i (1 -h 1)(1 -h 2) …. (1 -hi-1) 9
Cache Types Instruction | Data | Unified | Split vs. Unified: • Split allows specializing each part • Unified allows best use of the capacity On-chip | Off-chip • on-chip : fast but small • off-chip : large but slow Single level | Multi level Anshul Kumar, CSE IITD 10
Cache Policies • • Placement Read Load Fetch • Replacement • Write Anshul Kumar, CSE IITD what gets placed where? when? from where? order of bytes/words? when to fetch new block? which one? when? to where? 11
Block placement strategies Direct mapped Block # 0 1 2 3 4 5 6 7 Data Tag Set associative Set # 0 1 2 3 Data 1 2 Search Anshul Kumar, CSE IITD Tag Search Fully associative Data 1 2 Tag 1 2 Search 12
Organization/placement policy Set 1 Cache Set Sector 1 Sector 2 Sector SE LRU Sector Tag Block 1 Block 2 Block B Block V D S AU 1 AU 2 AU A Anshul Kumar, CSE IITD 13
Addressing Cache Sector Name Set Index Block Displacement Address Selects set Compared to Tags Selects Block Selects AU Early select: access data after tag matching Late select: access data while tag matching Anshul Kumar, CSE IITD 14
Cache organization example Sector Block Sets Sector Block 1 Tag V D AU AU 2 3 4 5 6 7 8 Anshul Kumar, CSE IITD 15
Cache access mechanism 31 Hit Address 0 18 12 index Tag index v tag 0 1 2 byte offset Data data . . . 4095 18 32 = Anshul Kumar, CSE IITD 16
Cache with 4 word blocks 31 Hit Address Tag 18 0 10 2 index v tag 0 1 2 byte offset Data block offset data . . . 1023 18 32 32 = Mux Anshul Kumar, CSE IITD 17
4 -way set associative cache 31 tag 20 0 2 byte offset 8 2 index v tag data 20 20 block offset v tag data 0. . 255 = 128 Mux 32 = 20 128 = Mux 32 20 128 Mux 32 = 128 Mux 32 Hit Mux Anshul Kumar, CSE IITD Data 18
Read policies • Sequential or concurrent – initiate memory access only after detecting a miss – initiate memory access along with cache access in anticipation of a miss • With or without forwarding – give data to CPU after filling the missing block in cache – forward data to CPU as it gets filled in cache Anshul Kumar, CSE IITD 19
Read Policies Sequential Simple: 1 Cache Memory Concurrent Simple: 1 Cache T Memory Sequential Forward: 1 Cache Memory Concurrent Forward: 1 Cache T Anshul Kumar, CSE IITD Memory 1 T 1 1 Teff=(1 -pm). 1 + pm. (T+2) 1 Teff=(1 -pm). 1 + pm. (T+1) Teff=(1 -pm). 1 + pm. (T)20
Load policies 0 4 AU Block 2 3 1 Cache miss on AU 1 Block Load Forward Fetch Bypass (wrap around load) Anshul Kumar, CSE IITD 21
Fetch Policies • Fetch on miss (demand fetching) • Software prefetching • Hardware Prefetching Anshul Kumar, CSE IITD 22
Fetch Policies • Demand fetching – fetch only when required (miss) • Hardware prefetching – automatically prefetch next block • Software prefetching – programmer decides to prefetch questions: – how much ahead (prefetch distance) – how often Anshul Kumar, CSE IITD 23
Software Control of Cache Software visible cache – mode selection (WT, WB etc) – block flush – block invalidate – block prefetch Anshul Kumar, CSE IITD 24
Replacement Policies • • Least Recently Used (LRU) Least Frequently Used (LFU) First In First Out (FIFO) Random Anshul Kumar, CSE IITD 25
Write Policies • Write Hit – Write Back – Write Through • Write Miss – Write Back – Write Through (with or without Write Allocate) Buffers are used in all cases to hide latencies Anshul Kumar, CSE IITD 26