Consistency Models of computation Coherence vs consistency coherence

  • Slides: 23
Download presentation
Consistency Models of computation

Consistency Models of computation

Coherence vs. consistency • coherence deals with accesses to the same memory location •

Coherence vs. consistency • coherence deals with accesses to the same memory location • consistency addresses the possible outcomes from legal orderings to all memory locations • common model (sequential consistency) is easy to understand but is difficult to implement, and has poor performance

What do you expect? • Sequential consistency: “Commit results in processor order” • simple

What do you expect? • Sequential consistency: “Commit results in processor order” • simple enough in a uniprocessor • similarly with context switching: just save and restore state • what about multi-threading, or multiprocessor machines?

MIPS R 10000 • issue instructions out of order • in-order commit • speculative

MIPS R 10000 • issue instructions out of order • in-order commit • speculative loads may execute and pass a value for modification long before the load commits in program order • meantime, some other processor may commit a store to that location

Producer - consumer P 1 write (A) ; flag : = 1 ; P

Producer - consumer P 1 write (A) ; flag : = 1 ; P 2 while(flag != 1) ; read (A); • assumes P 1’s writes become visible to P 2 in program order

One or both proceed P 1 X : = 0 ; . . .

One or both proceed P 1 X : = 0 ; . . . if (Y == 0) kill P 2; P 2 Y : = 0 ; . . . if (X == 0) kill p 1 ; • it’s a race through the critical section

Sequential consistency • results can be mapped to some sequential execution where the instructions

Sequential consistency • results can be mapped to some sequential execution where the instructions of each process appear in that program order equivalently: • memory operations proceed in program order • all writes are atomic and become visible to all processors at the same time

The need to relax • strict sequential consistency has severe performance drawbacks, so: a)

The need to relax • strict sequential consistency has severe performance drawbacks, so: a) keep sequential consistency, and use prefetch and speculation, or b) relax the consistency model – and be prepared to think carefully about programs

Attributes of consistency models • system specification – which orders are preserved, and which

Attributes of consistency models • system specification – which orders are preserved, and which are not? is there system support to enforce a particular order? • programmer interface: the set of rules that will lead to the expected execution • translation mechanism: how to translate program annotations to hardware actions

Alternative 1 • total store ordering: allows a read to bypass an earlier incomplete

Alternative 1 • total store ordering: allows a read to bypass an earlier incomplete write • helps hide write latency • can be provided by fence instructions • SPARC v 9 provides various memory barrier instructions

Alternative 2 • partial store ordering: allow writes as well as reads to bypass

Alternative 2 • partial store ordering: allow writes as well as reads to bypass writes • writes cannot bypass reads • writes are still atomic • very different from sequential consistency – e. g. spinning on a flag doesn’t work • needs a store barrier instruction to emulate sequential consistency

Alternative 3 • processor consistency: same as total store ordering, but does not guarantee

Alternative 3 • processor consistency: same as total store ordering, but does not guarantee atomic writes • implemented in recent Intel processors

Weak ordering • just try to preserve data and control dependencies within a process

Weak ordering • just try to preserve data and control dependencies within a process • don’t worry about the order of memory operations between synchronization points • e. g. don’t worry about the exact order of independent reads and writes within a critical section

Weak ordering • code from outside (before or after) a critical section cannot be

Weak ordering • code from outside (before or after) a critical section cannot be reordered with code inside it • code before a barrier must commit before entering, code after a barrier must not issue until the barrier is left • code before a flag wait must commit before waiting, and code after must not issue before flag is set by the producer • code before setting of a flag must commit first, and code after must not issue before the flag is set

Weak ordering • a good match to modern CPUs and aggressive compiler optimizations •

Weak ordering • a good match to modern CPUs and aggressive compiler optimizations • hardware must recognize synchronization, or compiler must insert proper barriers • MIPR R 10000 provides sync instruction and fence count register • sync disables issue until fence register is zero and all outstanding memory operations have committed • fence count incremented on an L 2 miss and decremented on a reply

Release consistency • relax weak ordering further • categorize all synchronization operations as either

Release consistency • relax weak ordering further • categorize all synchronization operations as either acquire or release • acquire is a read (load) on a protected variable, like a lock or a waiting on a flag • release is a write (store) granting access to others, like unlock or setting a flag • barrier is release (arrival) and acquire (departure)

In practice • MIPS processors are sequentially consistent • Sun supports total or partial

In practice • MIPS processors are sequentially consistent • Sun supports total or partial store ordering • Intel supports processor consistency • Alpha and Power. PC support weak ordering; Power 4 and Power 5 do not guarantee atomic writes

Processor consistency • a simple model with good performance • writes must become visible

Processor consistency • a simple model with good performance • writes must become visible to all processors in program order • loads can bypass writes

Back to our examples Under these rules, • does producer-consumer work? • does one-or-both

Back to our examples Under these rules, • does producer-consumer work? • does one-or-both work?

Results under processor consistency • producer-consumer is okay because P 1’s actions are both

Results under processor consistency • producer-consumer is okay because P 1’s actions are both writes and they must become visible sequentially • one-or-both can break because loads can bypass writes – if (X == 0) is a load – Y = 0 is a write

Intel Itanium • • loads are not reordered with other loads stores are not

Intel Itanium • • loads are not reordered with other loads stores are not reordered with other stores are not reordered with older loads stores to the same location have a total order • a load may be reordered with an older store to a different location

Itanium example 1 • initially, x=y=0 P 1 R 1 <- x y <-

Itanium example 1 • initially, x=y=0 P 1 R 1 <- x y <- 1 P 2 R 2 <- y x <- 1 (loads) (stores) • we will never see R 1 = R 2 = 1 because stores are not reordered with older loads

Itanium example 2 • initially, x=y=0 P 1 x <- 1 R 1 <-

Itanium example 2 • initially, x=y=0 P 1 x <- 1 R 1 <- y P 2 y <- 1 R 2 <- x (stores) (loads) • we may see R 1 = R 2 = 0 because loads may be reordered with older stores