How to Shadow Every Byte of Memory Used
How to Shadow Every Byte of Memory Used by a Program Nicholas Nethercote — National ICT Australia Julian Seward — Open. Works LLP 1
Shadow memory tools • Shadow every byte of memory with another value that describes it shadow memory tools shadow value tools • This talk: – Why shadow memory is useful – How to implement it well 2
Examples bugs security properties Tool(s) Shadow memory helps find. . . Memcheck, Purify Memory errors Eraser, DRD, Helgrind, etc. Data races Hobbes Run-time type errors Annelid Array bounds violations Taint. Check, LIFT, Taint. Trace Uses of untrusted values “Secret tracker” Leaked secrets Redux Dynamic dataflow graphs Dyn. Comp. B Invariants pin. SEL System call side-effects 3
Shadow memory is difficult • Performance – Lots of extra state, many operations instrumented • Robustness original values shadow values squeez e! address space • Trade-offs must be made 4
An example tool: Memcheck 5
Memcheck • Three kinds of information: – A (“addressability”) bits: 1 bit / memory byte – V (“validity”) bits: 1 bit / register bit, 1 bit / memory bit – Heap blocks: location, size, allocation function • Memory information: original memory byte 0110 0101 shadow memory VVVV A V bits only used if A bit is “addressable” 6
A simplementation 7
Basics (I) No. Access DSM SM 1 0 1 SM 2 VVVVVVV V VVVVVVV A -------- 0 VVVVVVVV A V. . . . A ---- 0 VVVV A 65535 VVVVVVV V 0 x 0001 FFFF PM . . . 0 KB 64 KB 128 KB 3904 KB 3968 KB 4032 KB 8
Basics (II) • Multi-byte shadow accesses: – Combine multiple single-byte accesses – Complain if any unaddressable bytes accessed – Values loaded from unaddressable bytes marked as defined • Range-setting (set_range) – Loop over many bytes, one at a time • Range-checking – E. g. : write(fd, buf, n) -- check n bytes in buf • Slow-down: 209. 6 x 9
Complications • Corruption of shadow memory – Possible with a buggy program – Originally used x 86 segmentation, but not portable – Keep original and shadow memory far apart, and pray • 64 -bit machines – – Three- or four-level structure would be slow Two level structure extended to handle 32 GB Slow auxiliary table for memory beyond 32 GB Better solution is an open research question 10
Four optimisations 11
#1: Faster loads and stores • Multi-byte loads/stores are very common – N separate lookups accesses is silly (where N = 2, 4, or 8) • If access is aligned, fully addressable – Extract/write V bits for N shadow bytes at once – Else fall back to slow case: 1 in a 1000 or less • Slow-down: 56. 2 x – 3. 73 x faster 12
#2: Faster range-setting • Range-setting large areas is common – Vectorise set_range – 8 -byte stride works well • Replacing whole SMs – If marking a 64 KB chunk as No. Access, replace the SM with the No. Access DSM – Add Defined and Undefined DSMs – Large read-only code sections covered by Defined DSM • Slow-down: 34. 7 x – 1. 62 x faster, 1. 97 x smaller 13
#3: Faster SP updates • Stack pointer (SP) updates are very common • Inc/dec size often small, � statically known – E. g. 4, 8, 12, 16, 32 bytes • More specialised range-setting functions – Unrolled versions of set_range() • Slow-down: 27. 2 x – 1. 28 x faster 14
#4: Compressed V bits • Partially-defined bytes (PDBs) are rare – Memory: 1 A bit + 8 V bits 2 VA bits – Four states: No. Access, Undefined, Defined, Part. Defined – Full V bits for PDBs in secondary V bits table – Registers unchanged -- still 8 V bits per byte • Slow-down: 23. 4 x – 4. 29 x smaller, 1. 16 x faster • Obvious in hindsight, but took 3 years to identify 15
Discussion • Optimising principles: – Start with a simplementation – Make the common cases fast – Exploit redundancy to reduce data sizes • Novelty? – First detailed description of Memcheck’s shadow memory – First detailed description of a two-level table version – First detailed evaluation of shadow memory – Compressed V bits 16
Evaluation 17
Robustness • Two-level table is very flexible – Small shadow memory chunks, each can go anywhere • Earlier versions required large contiguous regions – Some programs require access to upper address space – Some Linux kernels have trouble mmap’ing large regions – Big problems with Mac OS X, AIX, other OSes • Memcheck is robust 18
SPEC 2000 Performance Tool No instrumentation Simple Memcheck + faster loads/stores Slowdown Relative improvement 4. 3 x 209. 6 x 56. 2 x 3. 73 x faster + faster range 34. 7 x 1. 62 x faster, 1. 97 x setting smaller + faster SP 27. 2 x 1. 28 x faster updates • Shadow memory causes about half of + compressed V 23. 4 x 1. 16 x faster, 4. 29 x Memcheck’s overhead bits smaller Overall 8. 9 x faster, 8. 5 x 19
Performance observations • Performance is a traditional research obsession “The subjective issues are important — ease of use and robustness, but performance is the item which would be most interesting for the audience. ” (my emphasis) • Users: slowness is #1 survey complaint – But most user emails are about bugs or interpreting results – Zero preparation is a big win • Cost/benefit – People will use slow tools if they are sufficiently 20
Alternative implementation • “Half-and-half” – Used by Hobbes, Taint. Trace, (with variation) LIFT a original memory b c. . . z a' b' shadow memory c'. . . z' constant offset • Compared to two-level table – Faster – Not robust enough for our purposes 21
If you remember nothing else. . . 22
Take-home messages • Shadow memory is powerful • Shadow memory can be implemented well • Implementations require trade-offs www. valgrind. or g 23
- Slides: 23