Cache Coherence for Shared Memory Multiprocessors 1 Cache

  • Slides: 21
Download presentation
Cache Coherence for Shared Memory Multiprocessors 1

Cache Coherence for Shared Memory Multiprocessors 1

Cache Coherence Problem o Example n Processors see different values for u after event

Cache Coherence Problem o Example n Processors see different values for u after event 3 P 2 P 1 u=? $ P 3 3 u=? 4 $ 5 $ u : 5 u = 7 u : 5 I/O devices 1 u : 5 2 Memory 2

Bus Snooping o A coherence technique for Bus-based shared memory multiprocessors o Snoopy cache

Bus Snooping o A coherence technique for Bus-based shared memory multiprocessors o Snoopy cache controller (SCC) inserted to do bus snooping o Bus transactions are visible to all SCCs P 1 Pn SCC $ Bus Mem I/O devices 3

Snooping for Write-Through Caches o When a SCC detects a relevant write transaction, it

Snooping for Write-Through Caches o When a SCC detects a relevant write transaction, it can either n Invalidate the block containing the relevant variable (write-invalidate approach) n Update the value in cache (write-update approach) 4

Write-Invalidate Protocol o Two states per block in each cache o As in uniprocessor

Write-Invalidate Protocol o Two states per block in each cache o As in uniprocessor o Hardware state bits associated with blocks that are in the cache o Invalid state is also used in place of “not present” state Pr. Rd/ -Pr. Wr / Bus. Wr V Bus. Wr / -- State Tag Data Pn P 1 $ Pr. Rd / Bus. Rd State Tag Data Bus $ I/O devices Mem I Pr. Wr / Bus. Wr A/B: if A is observed, transaction B is generated This is just a particular design where on a write miss, the processor writes to main memory. Other designs may read the block first to validate it. 5

Example o Three processors, consider the states of the blocks containing X Operation P

Example o Three processors, consider the states of the blocks containing X Operation P 1 $ (X / State) P 2 $ P 3 $ Main memory Initially ? /I 10 P 2 Rd X ? /I 10 / V ? /I 10 P 3 Rd X ? /I 10 / V 10 P 2 Wr X=15 Block remains ? /I invalid. 15 / V 10 / I 15 P 1 Rd X Updating the value of X isn’t enough to validate 15 /block V 15 / V the whole P 1 Wr X = 3 3/V 15 / I 10 / I 3 P 3 Wr X = 6 3/I 15 / I 10 / I 6 6

Bus Snooping o Advantages n No need to change processor design n No explicit

Bus Snooping o Advantages n No need to change processor design n No explicit coherence statements added to program o Snoopy cache controller observes events from n Local processor n Bus o Write operations n Write-invalidate vs. write-update n Write-through caches Snoopy Cache Controller o See last lecture n Write-back caches o Now, writes take place locally; SCCs don’t observe them o How can we handle this? Extra work has to be done 7

Write-Back Caches o Usually have a “dirty bit” o One bit per block o

Write-Back Caches o Usually have a “dirty bit” o One bit per block o State n True: block has been modified n False: block unchanged o Use for uniprocessor n Block has to be written back to memory upon replacement o Use for multiprocessors n Same as uniprocessor plus n It means the processor “owns” the block 8

The Extra Work …. . . before a processor writes into cache, it performs

The Extra Work …. . . before a processor writes into cache, it performs an “ownership” transaction… o Case 1: No other modified copies of block in system n o Case 2: A modified copy exists somewhere in the system n n o Processor can write back Old owner o Writes block to memory o Invalidates its local copy New owner o Reads the block as it’s being written back to memory o Performs write What the new owner did is called “read to own” (read to modify) transaction o There is only one owner at a time o Still don’t get it? Wait until you see the MSI protocol! 9

Ownership Overhead o Ownership transactions are overhead o If it happens every time a

Ownership Overhead o Ownership transactions are overhead o If it happens every time a write is needed n A block will be written back to memory every time n Then, write-back caches would be as good/bad as write-through o Let’s cross our fingers and count on the concept of locality n Spatial and temporal locality can do it for us n A processor owns the block and performs several writes consecutively 10

MSI Protocol: States o We need to differentiate between reads and writes o Split

MSI Protocol: States o We need to differentiate between reads and writes o Split the Valid state into two states n I: Invalid n S: Shared (one or more can read only) n M: Modified or Dirty (only one can write) o This means it’s another write-invalidate protocol 11 Valid

MSI Protocol: Events/Actions o Local processor events n Pr. Rd: read n Pr. Wr:

MSI Protocol: Events/Actions o Local processor events n Pr. Rd: read n Pr. Wr: write o Bus transactions n Bus. Rd: read w/ no intent to modify n Bus. Rd. X: read w/ intent to modify (read to own) n Bus. WB: update memory o Possible actions n _: Nothing n Bus. Rd: send read request over the bus n Bus. Rd. X: ownership (read to own) transaction n Flush: copy modified block to memory 12

MSI Protocol: State Transitions Pr. Rd, Pr. Wr/_ M Pr o m o t

MSI Protocol: State Transitions Pr. Rd, Pr. Wr/_ M Pr o m o t e Pr W r / B u s R d X Pr R d / B u s R d. X/ _ Pr. Rd, Bus. Rd /_ Bus. Rd. X/Flush I 13 D e m o te B u s R d/ F l u s h S Pr. Wr/Bus. Rd. X

MSI Protocol: Example o Three processors, consider the states of the blocks containing X

MSI Protocol: Example o Three processors, consider the states of the blocks containing X Operation P 1 $ (X / State) P 2 $ P 3 $ Main memory Initially ? /I 10 P 2 Rd X ? /I 10 / S ? /I 10 P 3 Rd X ? /I 10 / S 10 P 2 Wr X=15 ? /I 15 / M 10 / I 10 P 1 Rd X 15 / S 10 / I 15 P 1 Wr X = 3 3/M 15 / I 10 / I 15 P 1 Wr X = 6 6/M 15 / I 10 / I 15 14

MESI Protocol: What’s wrong with MSI? o Another write-invalidate protocol o Consider this MSI

MESI Protocol: What’s wrong with MSI? o Another write-invalidate protocol o Consider this MSI scenario n Block containing X isn’t in any cache n P 1 reads X: Bus. Rd, state: S n P 1 modifies X: Bus. Wr, state: M n Bus. Wr is to let everybody else know X is being modified o Previous scenario has 2 bus transactions o No need for 2 transactions since P 1 is the only processor to know about X! 15

MESI Protocol: States o Same as MSI except S is split in 2 n

MESI Protocol: States o Same as MSI except S is split in 2 n E: Exclusive clean (only one processor) n S: Shared clean (more than one processor) o Let’s consider same scenario n Block containing X isn’t in any cache n P 1 reads X: Bus. Rd, state: E n P 1 modifies X: nothing, state: M n In other words, P 1 doesn’t need to let anybody know about the modification 16

MESI Protocol: Hardware Support o Additional bus signal is needed n Use S signal

MESI Protocol: Hardware Support o Additional bus signal is needed n Use S signal (S for shared) n This helps processor know whether to load block in E or S state o A cache controller asserts S signal if the relevant block is in cache o S bus signal is a wired OR line 17

MESI Protocol: State Transitions n n A fast way for the new reader to

MESI Protocol: State Transitions n n A fast way for the new reader to read the block While flushing a shared block, Flush’ means only 1 processor is responsible Other protocol variations may not flush a clean block E S I Bus. Rd. X/Flush’ S Not(S) Bus. Rd/Flush Pr. Rd, /_ D e m o te n M Pr. Wr/_ o Diagram only showing labels for what’s different from MSI Flushing a “clean” block Pr o m o t e o 18

Dragon Protocol o Write-back update protocol o States n Exclusive (E): 1 cache has

Dragon Protocol o Write-back update protocol o States n Exclusive (E): 1 cache has a clean copy n Shared-clean (Sc): 2 or more caches have a clean copy; memory up-to-date n Shared-modified (Sm): 1 cache just modified the block, some other chaches memory outdated n Modified (M): 1 cache has a modified copy o Added processor events: Pr. Rd. Miss, Pr. Wr. Miss (remember we don’t have I state) o Added bus transactions: Bus. Upd n Broadcast the word or byte written by processor so other processors can update their copies 19

Dragon Protocol: State Transitions Pr. Rd/— Bus. Upd/Update Pr. Rd/— Bus. Rd/— E Sc

Dragon Protocol: State Transitions Pr. Rd/— Bus. Upd/Update Pr. Rd/— Bus. Rd/— E Sc Pr. Rd. Miss/Bus. Rd(S) Pr. Wr/— Pr. Wr/Bus. Upd(S) Bus. Upd/Update Bus. Rd/Flush Pr. Wr. Miss/(Bus. Rd(S); Bus. Upd) Sm Pr. Wr/Bus. Upd(S) Pr. Wr. Miss/Bus. Rd(S) M Pr. Rd/— Pr. Wr/Bus. Upd(S) Bus. Rd/Flush Pr. Rd/— Pr. Wr/— 20

Snoopy Protocol Taxonomy Cache Write- through Write-back Protocol Write-invalidate Write-update IV MSI MESI Homework

Snoopy Protocol Taxonomy Cache Write- through Write-back Protocol Write-invalidate Write-update IV MSI MESI Homework Dragon 21