Multiprocessor Highlights MESI Cache Coherence Protocol Memory Consistency
- Slides: 4
Multiprocessor Highlights MESI Cache Coherence Protocol, Memory Consistency, ILP and MC Zhao Zhang 2003 1
MESI Protocol From local processor’s viewpoint, for each cache block Modified: Only I have a copy and the copy has been modifed; must respond to any read/write request Exclusive-clean: Only I have a copy and the copy is clear; no need to inform others about my changes Shared: Someone else may have copy; have to inform others about my changes Invalid: The block has been invalidated (possibly on the request of someone else) Actions highlight: Have read misses on a block: send read request onto bus Have write misses on a block: send write request onto bus Receive bus read request: transit the block to shared state Receive bus write request: transit the block to invalid state Must write back data when transiting from modified state 2
Memory Consistency Model Define memory correctness for parallel execution: Execution appears to the that of some correct execution of some theoretical parallel computer which has n sequential processors Particularly, remote writes must appear in a local processor in some correct sequence Typical memory consistency model: Sequential consistency n n n Memory read/writes are globally serialized; assume every cycle only one processor can proceed for one step, and write result appears on other processors immediately Processors do not reorder local reads and writes Note #possible sequences is an exponential function of #inst Total storing order n n Only writes are globally serialized; assume every cycle at most one write can proceed, and the write result appears immediately Processors may reorder local reads/writes without RAW dependence Processor consistency n n Writes from one processor appear in the same order on all other processors Processors may reorder local reads/writes without RAW dependence 3
Memory Consistency and ILP Sequential consistency, TSO and PC are strong consistency models (but TSO and PC are relaxed consistency models) Why use weak consistency models (e. g. release consistency)? n n Otherwise, without speculative execution recovery, every write to shared data may take a full memory access latency (can afford 100 ns for every such write on 2 GHz 4 -way issue processors? ) For SC, reads cannot bypass any previous write (even without RAW dependence) Strong consistency may work efficiently with speculative execution in ILP (PC and TSO in practice; SC can be supported with speculative cache) 4