CSI 312 CCE 325 Computer Architecture Chapter 5










































































- Slides: 74

CSI 312 / CCE 325 – Computer Architecture Chapter 5 Memory Hierarchy Part of the slides in this chapter are prepared by Dr. Majd Sakr AUST 2004 Many thanks goes to him for his contributions in this course. CSI/CCE - Computer Architecture 1

Exploiting Memory Hierarchy ° Users want large and fast memories! ° SRAM access times are 0. 5 – 5 ns at a very high cost. ° DRAM access times are 50 -70 ns at a very high cost. ° Disk access times are 5 to 20 million ns at cost of $. 50 to $2 per GB*. 2 CSI/CCE - Computer Architecture * these Prices are outdated, but still they show a figure.

Memory Hierarchy Size & access time increase 3 CSI/CCE - Computer Architecture

Principle of Locality § Programs access a small proportion of their address space at any time § Temporal locality § Items accessed recently are likely to be accessed again soon § e. g. , instructions in a loop, induction variables § Spatial locality § Items near those accessed recently are likely to be accessed soon § E. g. , sequential instruction access, array data CSI/CCE - Computer Architecture

Cache (1) Our initial focus: 2 levels of memory (upper, lower) block: minimum unit of data hit: data requested is found in the upper level miss: data requested is not found in the upper level 5 CSI/CCE - Computer Architecture

Cache (2) § Two issues: § How do we know if a data item is in the cache? § If it is, how do we find it? § Our first example: § block size is one word of data § "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e. g. , lots of items at the lower level share locations in the upper level 6 CSI/CCE - Computer Architecture

Cache Design §How do we organize cache? §Where does each memory address map to? § (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location. ) §How do we know which elements are in cache? §How do we quickly locate them? 7 CSI/CCE - Computer Architecture

Direct Mapped Cache ° Mapping: address is modulo the number of blocks in the cache 8 CSI/CCE - Computer Architecture

Direct-Mapped Cache (1/2) ° In a direct-mapped cache, each memory address is associated with one possible block within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache • Block is the unit of transfer between cache and memory 9 CSI/CCE - Computer Architecture

Direct-Mapped Cache (2/2) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 1 2 3 4 Byte Direct Mapped Cache ° Cache Location 0 can be occupied by data from: • Memory location 0, 4, 8, . . . • 4 blocks => any memory location that is multiple of 4 10

Issues with Direct-Mapped Tag Index Offset ° Since multiple memory addresses map to same cache index, how do we tell which one is in there? ° What if we have a block size > 1 byte? ° Answer: divide memory address into three fields: HEIGHT WIDTH ttttttttt iiiii oooo tag to check if have correct block CSI/CCE - Computer Architecture index to select block byte offset within block 11

Direct-Mapped Cache Terminology ° All fields are read as unsigned integers. ° Index: specifies the cache index (which “row” of the cache we should look in) ° Offset: once we’ve found correct block, offset specifies which byte within the block we want ° Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location 12 CSI/CCE - Computer Architecture

Caching Terminology ° When we try to read memory, 3 things can happen: 1. cache hit: cache block is valid and contains proper address, so read desired word 2. cache miss: nothing in cache in appropriate block, so fetch from memory 3. cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy) 13 CSI/CCE - Computer Architecture

Direct-Mapped Cache Example (1/3) ° Suppose we have a 16 KB of data in a direct-mapped cache with 4 word blocks ° Determine the size of the tag, index and offset fields if we are using a 32 -bit architecture ° Offset • need to specify correct byte within a block • block contains 4 words (in MIPS words are aligned to multiples of 4 bytes) = 16 bytes = 24 bytes • need 4 bits to specify correct byte 14 CSI/CCE - Computer Architecture

Direct-Mapped Cache Example (2/3) ° Index: (index into an “array of blocks”) • need to specify correct row in cache • cache contains 16 KB = 214 bytes • block contains 24 bytes (4 words) • # blocks/cache = bytes/cache bytes/block = 214 bytes/cache 24 bytes/block = 210 blocks/cache • need 10 bits to specify this many rows CSI/CCE - Computer Architecture 15

Direct-Mapped Cache Example (3/3) ° Tag: use remaining bits as tag • tag length = address length – offset - index = 32 - 4 - 10 bits = 18 bits • so tag is leftmost 18 bits of memory address ° Why not full 32 bit address as tag? • All bytes within block need same address (4 b) • Index must be same for every address within a block, so it’s redundant in tag check, thus can leave off to save memory (here 10 bits) 16 CSI/CCE - Computer Architecture

TIO great cache mnemonic 2(H+W) = 2 H * 2 W AREA (cache size, B) = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) Tag Index Offset HEIGHT (# of blocks) WIDTH (size of one block, B/block) AREA (cache size, B) 17 CSI/CCE - Computer Architecture

Accessing data in a direct mapped cache Memory ° Ex. : 16 KB of data, Address (hex) Value of Word direct-mapped, 4 word blocks. . . ° Read 4 addresses 1. 0 x 00000014 2. 0 x 0000001 C 3. 0 x 00000034 4. 0 x 00008014 ° Memory values on right: • only cache/ memory level of hierarchy CSI/CCE - Computer Architecture 00000010 00000014 00000018 0000001 C. . . 00000030 00000034 00000038 0000003 C. . . 00008010 00008014 00008018 0000801 C. . . a b c d. . . e f g h. . . i j k l. . . 18

Accessing data in a direct mapped cache ° 4 Addresses: • 0 x 00000014, 0 x 0000001 C, 0 x 00000034, 0 x 00008014 ° 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields 0000000001 0100 0000000001 1100 00000000011 0100 0000000010 000001 0100 Tag CSI/CCE - Computer Architecture Index Offset 19

16 KB Direct Mapped Cache, 16 B blocks ° Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid) Valid Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 x 4 -7 0 x 0 -3 . . . 1022 0 1023 0 CSI/CCE - Computer Architecture 0 x 8 -b 0 xc-f . . . 20

1. Read 0 x 00000014 ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 21 CSI/CCE - Computer Architecture

So we read block 1 (000001) ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 22 CSI/CCE - Computer Architecture

No valid data ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 23 CSI/CCE - Computer Architecture

So load that data into cache, setting tag, valid ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 24 CSI/CCE - Computer Architecture

Read from cache at offset, return word b ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 25 CSI/CCE - Computer Architecture

2. Read 0 x 0000001 C = 0… 00 0. . 001 1100 ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 26 CSI/CCE - Computer Architecture

Index is Valid ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 27 CSI/CCE - Computer Architecture

Index valid, Tag Matches ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 28 CSI/CCE - Computer Architecture

Index Valid, Tag Matches, return d ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 29 CSI/CCE - Computer Architecture

3. Read 0 x 00000034 = 0… 00 0. . 011 0100 ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 30 CSI/CCE - Computer Architecture

So read block 3 ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 31 CSI/CCE - Computer Architecture

No valid data ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 32 CSI/CCE - Computer Architecture

Load that cache block, return word f ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 33 CSI/CCE - Computer Architecture

4. Read 0 x 00008014 = 0… 10 0. . 001 0100 ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 34 CSI/CCE - Computer Architecture

So read Cache Block 1, Data is Valid ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 35 CSI/CCE - Computer Architecture

Cache Block 1 Tag does not match (0 != 2) ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 36 CSI/CCE - Computer Architecture

Miss, so replace block 1 with new data & tag ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 i j k l 1 1 2 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 37 CSI/CCE - Computer Architecture

And return word j ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 i j k l 1 1 2 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 38 CSI/CCE - Computer Architecture

Do an example yourself. What happens? ° Chose from: Cache: Hit, Miss w. replace Values returned: a , b, c, d, e, . . . , k, l ° Read address 0 x 00000030 ? 00000000011 0000 ° Read address 0 x 0000001 c ? 0000000001 1100 Cache Valid. Tag Index 0 0 1 1 2 2 0 3 1 0 4 0 5 0 6 0 7 0 0 x 4 -7 0 x 0 -3 . . . CSI/CCE - Computer Architecture 0 x 8 -b 0 xc-f i j k l e f g h . . . 39

Answers ° 0 x 00000030 a hit Memory Address Value of Word. . . a 00000010 b 00000014 ° 0 x 0000001 c a miss c 00000018 Index = 1, Tag mismatch, so d 0000001 c replace from memory, . . . Offset = 0 xc, value = d e 00000030 f 00000034 ° Since reads, values g 00000038 must = memory values 0000003 c h. . . whether or not cached: i 00008010 • 0 x 00000030 = e j 00008014 k 00008018 • 0 x 0000001 c = d 40 l 0000801 c CSI/CCE - Computer Architecture. . . Index = 3, Tag matches, Offset = 0, value = e

Open your Eyes… ° Drawbacks of Larger data Block Size(level 2) • Larger block size means larger miss penalty - on a miss, takes longer time to load a new block from next level • If block size is too big relative to cache size, then there are too few blocks - Result: miss rate goes up ° In general, minimize Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate 41 CSI/CCE - Computer Architecture

Review… ° Hit Time = time to find and retrieve data from current level cache ° Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) ° Hit Rate = % of requests that are found in current level cache ° Miss Rate = 1 - Hit Rate CSI/CCE - Computer Architecture 42

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate Assume Hit Rate is included in the Hit Time. ° Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate 43 CSI/CCE - Computer Architecture

Bits in Cache - Home. Read 2 m 2 m 44 CSI/CCE - Computer Architecture

Bits in Cache - Home. Read 45 CSI/CCE - Computer Architecture

slide 8 46 CSI/CCE - Computer Architecture

Hits vs. Misses (1) °Read hits • this is what we want! °Read misses • stall the CPU, fetch block from memory, deliver to cache, restart 47 CSI/CCE - Computer Architecture

Hits vs. Misses (2) °Write hits: • can replace data in cache and memory (write-through: data is always consistent in both) • write the data only into the cache (writeback the cache later) °Write misses: • read the entire block into the cache, then write the word 48 CSI/CCE - Computer Architecture

Performance (1) °Increasing the block size in cache tends to decrease miss rate: 49 CSI/CCE - Computer Architecture

Performance (2) ° Simplified model: CPU execution time = (CPU execution clk cycles + Mem-stall clk cycles) ´ cycle time Mem-stall cycles = # of instructions ´ miss ratio ´ miss penalty ---(applies for read/write but, write: + write buffer stalls)--- ° Two ways of improving performance: • decreasing the miss ratio • decreasing the miss penalty 50 CSI/CCE - Computer Architecture

51 CSI/CCE - Computer Architecture

52 CSI/CCE - Computer Architecture

Review… ° Hit Time = time to find and retrieve data from current level cache ° Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) ° Hit Rate = % of requests that are found in current level cache ° Miss Rate = 1 - Hit Rate CSI/CCE - Computer Architecture 54

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate Assume Hit Rate is included in the Hit Time. ° Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate 55 CSI/CCE - Computer Architecture

56 CSI/CCE - Computer Architecture

Example °Assume • Hit Time = 1 cycle • Miss rate = 5% • Miss penalty = 20 cycles • Calculate AMAT… °Average memory access time = 1 + 0. 05 x 20 = 1 + 1 cycles = 2 cycles CSI/CCE - Computer Architecture 57

Types of Cache Misses (1/2) ° “Three Cs” Model of Misses ° 1 st C: Compulsory Misses • occur when a program is first started • cache does not contain any of that program’s data yet, so misses are bound to occur • can’t be avoided easily, so won’t focus on these in this course 59 CSI/CCE - Computer Architecture

Types of Cache Misses (2/2) ° 2 nd C: Conflict Misses • miss that occurs because two distinct memory addresses map to the same cache location • two blocks (which happen to map to the same location) can keep overwriting each other • big problem in direct-mapped caches • how do we lessen the effect of these? ° Dealing with Conflict Misses • Solution 1: Make the cache size bigger - Fails at some point • Solution 2: Multiple distinct blocks can fit in the same cache Index? 60 CSI/CCE - Computer Architecture

Associative Caches § Fully associative • Allow a given block to go in any cache entry • Requires all entries to be searched at once • Comparator per entry (expensive) § n-way set associative • Each set contains n entries • Block number determines which set - (Block number) modulo (#Sets in cache) • Search all entries in a given set at once • n comparators (less expensive) CSI/CCE - Computer Architecture

Associative Cache Example CSI/CCE - Computer Architecture

Spectrum of Associativity ° For a cache with 8 entries CSI/CCE - Computer Architecture

Replacement Policy ° Direct mapped: no choice ° Set associative • choose among entries in the set ° Least-recently used (LRU) • Choose the one unused for the longest time ° Random • Gives approximately the same performance as LRU for high associativity CSI/CCE - Computer Architecture

Associativity Example ° Compare 4 -block caches • Direct mapped, 2 -way set associative, fully associative • Block access sequence: 0, 8, 0, 6, 8 ° Direct mapped To which cache block each block address map? Block address 0 8 0 6 8 Cache index 0 0 0 2 0 CSI/CCE - Computer Architecture Hit/miss miss 0 Mem[0] Mem[8] Block address Cache Block 0 (0 mod 4) = 0 6 (6 mod 4) = 2 8 (8 mod 4) = 0 Cache content after access 1 2 Mem[6] 3

Associativity Example ° 2 -way set associative Block address 0 8 0 6 8 Cache index 0 0 0 Hit/miss hit miss Block address Cache Block 0 (0 mod 2) = 0 6 (6 mod 2) = 0 8 (8 mod 2) = 0 Cache content after access Set 0 Set 1 Mem[0] Mem[8] Mem[0] Mem[6] Mem[8] Mem[6] Block access sequence: 0, 8, 0, 6, 8 n Block address 0 8 0 6 8 CSI/CCE - Computer Architecture Fully associative Hit/miss hit Cache content after access Mem[0] Mem[0] Mem[8] Mem[6]

How Much Associativity ° Increased associativity decreases miss rate CSI/CCE - Computer Architecture

Set Associative Cache Organization CSI/CCE - Computer Architecture

Fully Associative Cache (1/3) ° Memory address fields: • Tag: same as before • Offset: same as before • Index: non-existant ° What does this mean? • no “rows”: any block can go anywhere in the cache • must compare with all tags in entire cache to see if data is there 69 CSI/CCE - Computer Architecture

Fully Associative Cache (2/3) ° Fully Associative Cache (e. g. , 32 B block) • compare tags in parallel 31 Cache Tag (27 bits long) = = : = Cache Tag Valid Cache Data B 31 B 0 : = 4 0 Byte Offset = : : : 70 CSI/CCE - Computer Architecture

Fully Associative Cache (3/3) § Benefit of Fully Associative Cache § No Conflict Misses (since data can go anywhere) §Drawbacks of Fully Associative Cache § Need hardware comparator for every single entry 71 CSI/CCE - Computer Architecture

N-Way Set Associative Cache (1/3) ° Memory address fields: • Tag: same as before • Offset: same as before • Index: points us to the correct “row” (called a set in this case) ° So what’s the difference? • each set contains multiple blocks • once we’ve found correct set, must compare with all tags in that set to find our data 73 CSI/CCE - Computer Architecture

N-Way Set Associative Cache (2/3) ° Summary: • cache is direct-mapped with respect to sets • each set is fully associative • basically N direct-mapped caches working in parallel: each has its own valid bit and data ° Given memory address: • Find correct set using Index value. • Compare Tag with all Tag values in the determined set. • If a match occurs, hit!, otherwise a miss. • Finally, use the offset field as usual to find the desired data within the block. CSI/CCE - Computer Architecture 74

N-Way Set Associative Cache (3/3) ° What’s so great about this? • even a 2 -way set assoc cache avoids a lot of conflict misses • hardware cost isn’t that bad: only need N comparators ° In fact, for a cache with M blocks, • it’s Direct-Mapped if it’s 1 -way set assoc • it’s Fully Assoc if it’s M-way set assoc • so these two are just special cases of the more general set associative design 75 CSI/CCE - Computer Architecture

Associative Cache Example (1) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 1 2 3 4 Byte Direct Mapped Cache ° Recall this is how a simple direct mapped cache looked. ° This is also a 1 -way setassociative cache! 76

Associative Cache Example (2) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 0 1 1 ° Here’s a simple 2 -way set associative cache. 77
Csi computer crime and security survey
Bus architecture in computer architecture
What is cce
Cce chautauqua
Digital diary cce
Adecco relax
Cce+
Development of cce
Cce upea ciencias de la educacion
Cce 2332
Cce upea
Cce curriculum
Difference between architecture and organisation
Basic computer organization
Ece 325
Apes 325 template
Infs 325
Infs 325
Infs 325
What is peripheral and interfacing
Council of nicea 325
Tres cestos contienen 575 manzanas
Forces applied at an angle
Article 325
Dcma etools
Methodology of econometrics
Star 325
A box of books weighing 325 n
Cse 325
Cpsc 325
A325sc bolt
325/100 simplificado
A 325
Asteroide 327
325 ad
Cse 325
Cpsc 325
Cpsc 325
Theotokos vs christotokos
Cse 312
123+132+321+312
El 312 usps
Ics 312
Cse 312
Geog 312 sfu
Java 8 312
866-556-8166
Cse 332 uw
Oposthotonus
Altivar 312 solar
2-312
Ssis 312
The sponsor must submit an ind safety
Bus 312
Numero cardinal mas cercano
What is the difference between 29 028 and 1 312
Katherine is very interested in cryogenics
Instruction
Ics 312
Ics 312
Simplify fractions
Bagian dalam sebuah pipa paralon yang berjari
Mcs 312
Mcs 312
Ee 312
Mcs 312
Mcs 312
Ee 312
Sebuah tangki berbentuk tabung berisi 462 liter bensin
Csi 3120
Csi 3120
Csi 321
Csi in project management
Placement of diacritics is examining
Csi varese