L 17 Caches II CSE 351 Spring 2020

  • Slides: 20
Download presentation
L 17: Caches II CSE 351 Spring 2020 Instructor: Ruth Anderson Teaching Assistants: Alex

L 17: Caches II CSE 351 Spring 2020 Instructor: Ruth Anderson Teaching Assistants: Alex Olshanskyy Rehaan Bhimani Callum Walker Chin Yeoh Diya Joy Eric Fan Edan Sneh Jonathan Chen Jeffery Tian Millicent Li Melissa Birchfield Porter Jones Joseph Schafer Connie Wang Eddy (Tianyi) Zhou CSE 351, Spring 2020

L 17: Caches II CSE 351, Spring 2020 Administrivia v Unit Summary #2 due

L 17: Caches II CSE 351, Spring 2020 Administrivia v Unit Summary #2 due Friday (5/08) Lab 3 due Wednesday (5/13) v You must log on with your @uw google account to access!! v § Google doc for 11: 30 Lecture: https: //tinyurl. com/351 -05 -06 A § Google doc for 2: 30 Lecture: https: //tinyurl. com/351 -05 -06 B 2

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy <1 ns

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy <1 ns Smaller, faster, costlier per byte 100 ns Larger, slower, cheaper 150, 000 ns per byte 10, 000 ns (10 ms) 1 -150 ms 1 ns 5 -10 ns registers 5 -10 s on-chip L 1 cache (SRAM) off-chip L 2 cache (SRAM) 1 -2 min main memory (DRAM) SSD Disk local secondary storage (local disks) 15 -30 min 31 days 66 months = 5. 5 years remote secondary storage (distributed file systems, web servers) 1 - 15 years 3

L 17: Caches II CSE 351, Spring 2020 Memory Hierarchies v Some fundamental and

L 17: Caches II CSE 351, Spring 2020 Memory Hierarchies v Some fundamental and enduring properties of hardware and software systems: § Faster storage technologies almost always cost more per byte and have lower capacity § The gaps between memory technology speeds are widening • True for: registers ↔ cache, cache ↔ DRAM, DRAM ↔ disk, etc. § Well-written programs tend to exhibit good locality v These properties complement each other beautifully § They suggest an approach for organizing memory and storage systems known as a memory hierarchy • For each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1 4

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy registers Smaller,

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy registers Smaller, faster, costlier per byte Larger, slower, cheaper byte CPU registers hold words retrieved from L 1 cache on-chip L 1 cache (SRAM) L 1 cache holds cache lines retrieved from L 2 cache off-chip L 2 cache (SRAM) main memory (DRAM) local secondary storage (local disks) L 2 cache holds cache lines retrieved from main memory Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers remote secondary storage (distributed file systems, web servers) 5

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy explicitly program-controlled

L 17: Caches II CSE 351, Spring 2020 An Example Memory Hierarchy explicitly program-controlled registers Smaller, faster, costlier per byte Larger, slower, cheaper byte (e. g. refer to exactly %rax, %rbx) on-chip L 1 cache (SRAM) off-chip L 2 cache (SRAM) program sees “memory”; hardware manages caching transparently main memory (DRAM) local secondary storage (local disks) remote secondary storage (distributed file systems, web servers) 6

L 17: Caches II CSE 351, Spring 2020 Intel Core i 7 Cache Hierarchy

L 17: Caches II CSE 351, Spring 2020 Intel Core i 7 Cache Hierarchy Processor package Core 0 Core 3 Regs L 1 d-cache Block size: 64 bytes for all caches Regs L 1 i-cache … L 2 unified cache L 1 d-cache L 1 i-cache L 2 unified cache L 3 unified cache (shared by all cores) L 1 i-cache and d-cache: 32 Ki. B, 8 -way, Access: 4 cycles L 2 unified cache: 256 Ki. B, 8 -way, Access: 11 cycles L 3 unified cache: 8 Mi. B, 16 -way, Access: 30 -40 cycles Main memory 7

L 17: Caches II CSE 351, Spring 2020 Making memory accesses fast! v v

L 17: Caches II CSE 351, Spring 2020 Making memory accesses fast! v v v Cache basics Principle of locality Memory hierarchies Cache organization § Direct-mapped (sets; index + tag) § Associativity (ways) § Replacement policy § Handling writes Program optimizations that consider caches 8

L 17: Caches II Cache Organization (1) CSE 351, Spring 2020 Note: The textbook

L 17: Caches II Cache Organization (1) CSE 351, Spring 2020 Note: The textbook uses “B” for block size v 9

L 17: Caches II Cache Organization (1) CSE 351, Spring 2020 Note: The textbook

L 17: Caches II Cache Organization (1) CSE 351, Spring 2020 Note: The textbook uses “b” for offset bits v Block Number Block Offset 10

L 17: Caches II CSE 351, Spring 2020 Polling Question [Cache II-a] v 11

L 17: Caches II CSE 351, Spring 2020 Polling Question [Cache II-a] v 11

L 17: Caches II CSE 351, Spring 2020 Cache Organization (2) v 12

L 17: Caches II CSE 351, Spring 2020 Cache Organization (2) v 12

L 17: Caches II CSE 351, Spring 2020 Review: Hash Tables for Fast Lookup

L 17: Caches II CSE 351, Spring 2020 Review: Hash Tables for Fast Lookup Insert: 5 27 34 102 119 Apply hash function to map data to “buckets” 0 1 2 3 4 5 6 7 8 9 13

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing Address Memory Block Num Block Data 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Cache Index 00 01 10 11 Block Data v 14

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing Address Memory Block Num Block Data 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Cache Index 00 01 10 11 v Block Data Map to cache index from block number § Lets adjacent blocks fit in cache simultaneously! • Consecutive blocks go in consecutive cache indices 15

L 17: Caches II CSE 351, Spring 2020 Practice Question v 16

L 17: Caches II CSE 351, Spring 2020 Practice Question v 16

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing

L 17: Caches II CSE 351, Spring 2020 Place Data in Cache by Hashing Address Memory Block Num Block Data 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Cache Index 00 01 10 11 v Block Data Collision! § This might confuse the cache later when we access the data § Solution? 17

L 17: Caches II CSE 351, Spring 2020 Tags Differentiate Blocks in Same Index

L 17: Caches II CSE 351, Spring 2020 Tags Differentiate Blocks in Same Index Memory Block Num Block Data 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Cache Index 00 01 10 11 Tag 00 Block Data 01 01 v 18

L 17: Caches II CSE 351, Spring 2020 Checking for a Requested Address v

L 17: Caches II CSE 351, Spring 2020 Checking for a Requested Address v Block Number 19

L 17: Caches II Cache Puzzle [Cache II–b] v CSE 351, Spring 2020 Vote

L 17: Caches II Cache Puzzle [Cache II–b] v CSE 351, Spring 2020 Vote at http: //pollev. com/rea Based on the following behavior, which of the following block sizes is NOT possible for our cache? § Cache starts empty, also known as a cold cache § Access (addr: hit/miss) stream: • A. B. C. D. E. (14: miss), (15: hit), (16: miss) 4 bytes 8 bytes 16 bytes 32 bytes We’re lost… 20