Theory of Memory W Paul Saarland University and

why might his be important? • Unites theories of – – – – –

Overlapping Transactions c public (a) b a c c

Sequentially Consistent Memory lemma 5 c public (a) b a c c

Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB

Two Memory Units m RS MMU RS sbuf funct. units LS CDB ROB

Single Processor OOO correctness lemma 6 m RS MMU RS sbuf funct. units LS

Multi Processor OOO implementation m RS MMU RS sbuf funct. units LS CDB data(i,

Multi Processor OOO correctness lemma 7 m RS MMU RS sbuf funct. units LS

X 64 architecture • CPU core mm – R: user registers – SR: system

segmentation off lemma 8 mm • 1 segment • large as entire address space

Bad news: cache state is visible • CPU core mm or devices – acc:

Good News: no device, no NC mode • acc. mmode: memory mode mm ca

Sequentially Consistent Physical Memory lemma 9 • acc. mmode: memory mode PM – WB:

Initialize page tables PM sbuf • 1 processor – sbuf invisible • operating mode:

Translated Linear Memory page tables PM sbuf mmu acc core acc tlb CR 3

Translated Consistent Linear Memory + sbufs lemma 10 LM page tables sbuf acc core

C 0: Pascal with C syntax configurations • c = ( pr, rd, lms,

Parallel C • c = ( pr, rd, lms, hm, gm) – – –

simulation relation consis(c, alloc, d) LM alloc (c, y) y alloc (c, p) p

Non optimizing compiler: step by step simulation

Optimizing compiler: simulation between IO-steps

Volatiles Sequentially Consistent lemma 11

Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release •

Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors

Slides: 36

Download presentation

Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University

why might his be important? • Unites theories of – – – – – store buffers interlocking caches cache coherence out of order execution X 64 instruction set address translation optimized compilation structured parallel C semantics • Explains why hypervisor might run structured parallel C • VCC is supposed to mirror structured parallel C semantics • thus VCC might be(come) sound

Specifying Memory x M(x)

Store Buffer memory M sbuf(y) r(j) w(i)

Caches M ca

Many Caches: Snooping M ca(1) ca(p)

Many Caches x. la M ca(1) ca(p) x. off

Many Caches x. off M ca(1) ca(p)

Overlapping Transactions c public (a) b a c c

Sequentially Consistent Memory lemma 5 c public (a) b a c c

Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB

Two Memory Units m RS MMU RS sbuf funct. units LS CDB ROB

Single Processor OOO correctness lemma 6 m RS MMU RS sbuf funct. units LS CDB ROB

Multi Processor OOO implementation m RS MMU RS sbuf funct. units LS CDB data(i, j) ROB

Multi Processor OOO correctness lemma 7 m RS MMU RS sbuf funct. units LS CDB data(i, j) ROB

X 64 architecture • CPU core mm – R: user registers – SR: system registers ca • CR 3 – acc: access – segmentation sbuf mmu acc • mmu: memory management unit – tlb: translation look aside buffer tlb • memory system acc segmentation core CR 3 R – mm: main memory – ca: cache – sbuf: store buffer

segmentation off lemma 8 mm • 1 segment • large as entire address space • segmentation invisible ca sbuf mmu acc segmentation core acc tlb CR 3 R

Bad news: cache state is visible • CPU core mm or devices – acc: access ca sbuf mmu acc core acc tlb CR 3 R • acc. adr: address • acc. r: rights (user, write, exe) • acc. data • acc. mmode: memory mode – WB: write back – WT: write through. . . – NC: no cache

Good News: no device, no NC mode • acc. mmode: memory mode mm ca – WB: write back – WT: write through. . . – NC: no cache not used sbuf mmu acc core acc tlb CR 3 R

Sequentially Consistent Physical Memory lemma 9 • acc. mmode: memory mode PM – WB: write back – WT: write through. . . mix on same address sbuf mmu acc core acc tlb CR 3 R • PM: sequentially consistent physical memory abstraction – Proof: MOESI invariants are maintained

Initialize page tables PM sbuf • 1 processor – sbuf invisible • operating mode: paging disabled – mmu invisible mmu acc core acc tlb CR 3 R • set up page table tree in PM

Translated Linear Memory page tables PM sbuf mmu acc core acc tlb CR 3 R • many processors • operating mode: paging enabled • keep tlb consistent

Translated Consistent Linear Memory + sbufs lemma 10 LM page tables sbuf acc core CR 3 R • many processors • operating mode: paging enabled • keep tlb consistent

C 0: Pascal with C syntax configurations • c = ( pr, rd, lms, hm, gm) – – – memory m pr program rest rd recursion depth lms: [0: recursion depth]!{local memories} hm: heap memory gm: global memory • subvariables – (m, i)[17]. gpr[3] • value of pointers: subvariables ! va(c, (m, i)) ba(m, i) size(m, i)

Parallel C • c = ( pr, rd, lms, hm, gm) – – – memory m pr program rest rd recursion depth lms: [0: recursion depth]!{local memories} hm: heap memory gm: global memory • Share – gm – hm • Interleave at small steps semantics steps va(c, (m, i)) ba(m, i) size(m, i)

simulation relation consis(c, alloc, d) LM alloc (c, y) y alloc (c, p) p

Non optimizing compiler: step by step simulation

Optimizing compiler: simulation between IO-steps

IO-steps (1): volatile accesses

Volatiles Sequentially Consistent lemma 11

Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory

Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory • Outlined correctness proof for implementation of structured parallel C – Initialisation – compilation