Theory of Memory W Paul Saarland University and
- Slides: 36
Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University
why might his be important? • Unites theories of – – – – – store buffers interlocking caches cache coherence out of order execution X 64 instruction set address translation optimized compilation structured parallel C semantics • Explains why hypervisor might run structured parallel C • VCC is supposed to mirror structured parallel C semantics • thus VCC might be(come) sound
Specifying Memory x M(x)
Store Buffer memory M sbuf(y) r(j) w(i)
Store Buffer memory M sbuf(y) r(j) w(i)
Caches M ca
Many Caches: Snooping M ca(1) ca(p)
Many Caches x. la M ca(1) ca(p) x. off
Many Caches x. la M ca(1) ca(p) x. off
Many Caches x. off M ca(1) ca(p)
Overlapping Transactions c public (a) b a c c
Sequentially Consistent Memory lemma 5 c public (a) b a c c
Tomasulo Schedulers for OOO IF issue reservation stations funct. units CDB ROB WB
Two Memory Units m RS MMU RS sbuf funct. units LS CDB ROB
Single Processor OOO correctness lemma 6 m RS MMU RS sbuf funct. units LS CDB ROB
Multi Processor OOO implementation m RS MMU RS sbuf funct. units LS CDB data(i, j) ROB
Multi Processor OOO correctness lemma 7 m RS MMU RS sbuf funct. units LS CDB data(i, j) ROB
Multi Processor OOO correctness lemma 7 m RS MMU RS sbuf funct. units LS CDB data(i, j) ROB
X 64 architecture • CPU core mm – R: user registers – SR: system registers ca • CR 3 – acc: access – segmentation sbuf mmu acc • mmu: memory management unit – tlb: translation look aside buffer tlb • memory system acc segmentation core CR 3 R – mm: main memory – ca: cache – sbuf: store buffer
segmentation off lemma 8 mm • 1 segment • large as entire address space • segmentation invisible ca sbuf mmu acc segmentation core acc tlb CR 3 R
Bad news: cache state is visible • CPU core mm or devices – acc: access ca sbuf mmu acc core acc tlb CR 3 R • acc. adr: address • acc. r: rights (user, write, exe) • acc. data • acc. mmode: memory mode – WB: write back – WT: write through. . . – NC: no cache
Good News: no device, no NC mode • acc. mmode: memory mode mm ca – WB: write back – WT: write through. . . – NC: no cache not used sbuf mmu acc core acc tlb CR 3 R
Sequentially Consistent Physical Memory lemma 9 • acc. mmode: memory mode PM – WB: write back – WT: write through. . . mix on same address sbuf mmu acc core acc tlb CR 3 R • PM: sequentially consistent physical memory abstraction – Proof: MOESI invariants are maintained
Initialize page tables PM sbuf • 1 processor – sbuf invisible • operating mode: paging disabled – mmu invisible mmu acc core acc tlb CR 3 R • set up page table tree in PM
Translated Linear Memory page tables PM sbuf mmu acc core acc tlb CR 3 R • many processors • operating mode: paging enabled • keep tlb consistent
Translated Consistent Linear Memory + sbufs lemma 10 LM page tables sbuf acc core CR 3 R • many processors • operating mode: paging enabled • keep tlb consistent
C 0: Pascal with C syntax configurations • c = ( pr, rd, lms, hm, gm) – – – memory m pr program rest rd recursion depth lms: [0: recursion depth]!{local memories} hm: heap memory gm: global memory • subvariables – (m, i)[17]. gpr[3] • value of pointers: subvariables ! va(c, (m, i)) ba(m, i) size(m, i)
Parallel C • c = ( pr, rd, lms, hm, gm) – – – memory m pr program rest rd recursion depth lms: [0: recursion depth]!{local memories} hm: heap memory gm: global memory • Share – gm – hm • Interleave at small steps semantics steps va(c, (m, i)) ba(m, i) size(m, i)
Parallel C • c = ( pr, rd, lms, hm, gm) – – – memory m pr program rest rd recursion depth lms: [0: recursion depth]!{local memories} hm: heap memory gm: global memory • Share – gm – hm • Interleave at small steps semantics steps • Problem: – Processor interleaves instructions of compiled programs code(p) va(c, (m, i)) ba(m, i) size(m, i)
simulation relation consis(c, alloc, d) LM alloc (c, y) y alloc (c, p) p
Non optimizing compiler: step by step simulation
Optimizing compiler: simulation between IO-steps
IO-steps (1): volatile accesses
Volatiles Sequentially Consistent lemma 11
Structured Parallel C • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory
Summary • Implement Locks using Volatiles • IO-steps (2): lock release • Run Processors alone on locked portions of linear memory • Lemma 1: sbufs invisible • Lemma 10: Ordinary C code in linear memory • Outlined correctness proof for implementation of structured parallel C – Initialisation – compilation
- Uni saarland kartenbüro
- Saarland lage
- Dej uni saarland
- Internal memory and external memory
- Primary memory and secondary memory
- Virtual memory and cache memory
- Semantics prototype
- Implicit memory vs explicit memory
- Long term memory vs short term memory
- Logical versus physical address space
- Which memory is the actual working memory?
- Virtual memory in memory hierarchy consists of
- Eidetic memory vs iconic memory
- Shared memory vs distributed memory
- Jean-paul rodrigue hofstra university
- Delivery centers
- Baddeley et hitch
- Working memory model
- Flashbulb memory theory
- Bartlett 1932
- David ausubel biography
- Theorylo
- Paul klee color theory
- What is two step flow theory
- 詹景裕
- Trait approaches to leadership
- Continental drift vs plate tectonics theory
- Continental drift vs plate tectonics
- Neoclassical organization theory
- Theory x and theory y of motivation
- Theory x and theory y
- Resolution hplc
- Lien theory vs title theory
- Y management
- Game theory and graph theory
- Valence bond theory vs molecular orbital theory
- Px and dxz overlap