Reducing Sandbox Overhead in Web Assembly Dimitar Bounov
Reducing Sandbox Overhead in Web. Assembly Dimitar Bounov
What is Web. Assembly (WASM)? A new low-level language for the web (coming Q 1 2017) Collaborative effort b/w Mozilla, Google, Microsoft, Apple Efficient target for C, C++ (upcoming - Java, Rust…)
In Firefox Aurora now
Why another language? Need for dynamic content on web (long live Flash…) Na. Cl (2009 -now) LLVM IR + SFI Asm. JS (2012 -now) Restricted JS subset And the winner is. . . Facebook source (10/2016):
Why not Asm. JS? Complex JS Semantics int fib (int n) { if (n < 3) return 1; return fib(n-1) + fib(n-2); }
Why not Asm. JS? Complex JS Semantics function fib(n) { n = n|0; Cast to int if (n >>> 0 < Cast 3) to positive int return 1|0; return (fib((n-1)|0) + fib((n-2)|0))|0; }
Why not Asm. JS? Complex JS Semantics Large Code Size Slower Compilation
WASM Design RISC-like virtual ISA (<255 opcodes) Structured control flow (blocks, if/else, loops, break) Separate stack and heap (enforced) - stack non-aliasable - stack frame size statically known Deterministic* semantics across x 86, x 64, ARM
WASM Example: Fibonacci Function Signature Locals (func $fib (param $0 i 32) (result i 32) (local $1 i 32) (if i 32 (i 32. lt_s $1 3) 1 (block if is an expression (set_local $1 (call fib (i 32. add $0 -1))) block is an expression (i 32. add $1 (call $fib (i 32. add $0 -2))))))
WASM Sandbox Control Flow - No computed jump instruction - Indirect function calls go through lookup table - Returns safe due to separate stack Memory Accesses - Heap Loads/Stores bounds checked (BC-ed)
WASM Sandbox Control Flow - No computed jump instruction - Indirect function calls go through lookup table - Returns safe due to separate stack Memory Accesses Safe stack makes many BCs redundant - Heap Loads/Stores bounds checked (BC-ed)
Memory Accesses base + addr uint 32_t addr = … HEAP base int x = *(base + addr); base+L
Bounds Checks base + addr uint 32_t addr = … HEAP assert(0 <= addr <= L); // BC base+L int x = *(base + addr); base, addr safe due to separate stack . . . assert(0 <= addr <= L); // BC Redundant! int y = *(base + addr); // Same Address *Global Value Numbering could handle this too. 10% of dynamic BCs on Angry. Bots
Bounds Checks - Constants Redundant! assert(0 <= 0 x. ABCD <= L); // BC int x = *(0 x. ABCD); // Constant Address ~22% of dynamic BCs on Angry. Bots *Global Value Numbering should have handled this.
Bounds Checks - Constant Offsets uint 32_t addr = … assert(0 <= addr+8 <= L); // BC int x = *(base + addr + 8); // Constant Offset. . . assert(0 <= addr+4 <= L); // BC Redundant*! int y = *(base + addr + 4); // Smaller Offset *ABCD (Bodik et. al) can handle this ~35% of dynamic BCs on Angry. Bots
Bounds Checks - Constant Offsets 2 uint 32_t addr = … assert(0 <= addr+4 <= L); // BC int x = *(base + addr + 4); // Constant Offset. . . assert(0 <= addr+8 <= L); // BC Not Quite Redundant : ( int y = *(base + addr + 8); // Bigger Offset
Bounds Checks + Slop base + addr+4 base + addr+8 uint 32_t addr = … assert(0 <= addr+4 <= L); // BC HEAP base int x = *(base + addr + 4); // Constant Offset. . . assert(0 <= addr+8 <= L); // BC int y = *(base + addr + 8); // Bigger Offset SLOP base+L+S
Bounds Checks + Slop base + addr+4 base + addr+8 uint 32_t addr = … HEAP assert(0 <= addr <= L); // BC base SLOP base+L+S int x = *(base + addr + 4); // Constant Offset. . . assert(0 <= addr <= L); // BC Redundant! int y = *(base + addr + 8); // Bigger Offset ~50% of dynamic BCs on Angry. Bots
Bounds Checks + Loops* uint 32_t addr = … constant stride for(; addr < end; addr++) assert(0 <= addr <= L); (*base + addr) = 0; Constant Stride + Slop Allow BC Hoisting * ongoing work
Bounds Checks + Loops* base + addr uint 32_t addr = … assert(0 <= addr <= L); HEAP base SLOP base+L+S for(; addr < end; addr++) (*base + addr) = 0; Constant Stride + Slop Allow BC Hoisting * ongoing work
Bounds Checks + Growing Memory …. HEAP assert(0 <= addr <= L); grow_memory() assert(0 <= addr <= L); base SLOP base+L+S
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= L); grow_memory() assert(0 <= addr <= L); base SLOP base+L+S
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= L 1); base SLOP base+L+S grow_memory() assert(0 <= addr <= L 1); Bounds checks need dynamic updates
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= L 1); base grow_memory() SLOP base+L+S Option 1: Patching assert(0 <= addr <= L 1); Difficult with threads Bad for code caching
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= *LIMIT); base grow_memory() // LIMIT = L 1 SLOP base+L+S Option 2: Indirection assert(0 <= addr <= *LIMIT); Extra memory load
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= MAX); base grow_memory() assert(0 <= addr <= MAX); SLOP base+L+S Option 3: Pre-declared MAX
Bounds Checks + Growing Memory base+L 1 …. HEAP assert(0 <= addr <= MAX); base grow_memory() assert(0 <= addr <= MAX); SLOP base+L MAX Option 3: Pre-declared MAX mmap up to MAX allocate pages on grow
Future Work Bounds Check Elimination with Loops Preserving precise exception semantics w/ BC hoisting Eliminating BCs for function calls
Lesson Safe stacks provide opportunities for reducing the overhead of SFI
Lesson Safe stacks provide opportunities for reducing the overhead of safe stacks [1] Control-Flow Bending: On the Effectiveness of Control-Flow Integrity, Carlini et al, USENIX Sec ‘ 15 [2] Losing Control: On the Effectiveness of Control-Flow Integrity under Stack Attacks, Conti et al CCS ‘ 15 [3] The Performance Cost of Shadow Stacks and Stack Canaries, Dang et al, CCS’ 15
Thank you! Q&A
WASM Example (func $fib (param $0 i 32) (result i 32) (local $1 i 32) (return (if i 32 (i 32. lt_s $1 3) 1 (block (set_local $1 (call fib (i 32. add $0 -1))) (i 32. add $1 (call $fib (i 32. add $0 -2)))))))
Web. Assembly - In Stores Now! In Firefox Nightly!
Web. Assembly - In Stores Now! In Firefox Nightly! Enable javascript. options. wasm in about: config
Web. Assembly - In Stores Now! In Firefox Nightly! Enable javascript. options. wasm in about: config Test it out at webassembly. github. io/demo
Why another language? Portable (thanks to precise semantics) Fast&Safe (close to actual hardware ISAs, lightweight sandbox) Small (carefully designed binary format)
WASM Tool emscripten
The Sandbox Memory accesses Jumps/Function Calls
The Sandbox - Memory Accesses Memory accesses require runtime checks Angry. Bots ~ 60 M accesses/sec
The Sandbox - Memory Accesses Memory accesses require runtime checks Angry. Bots ~ 60 M accesses/sec Do we really need all runtime checks?
Shoutouts! Dan Gohman, Luke Wagner, Michael Bebenita Wider JS/Moz. Research Team (Benjamin, Alon, Jakob…) Interns (London Rocked!) Academic Recruiting team (You guys Rocked!) Mozillians Community
Summary Sandboxing efficiently across 3 architectures x 3 OSs is hard. Many corner cases Difficult to test Security Critical BCE landing in next Aurora (after grow/resize memory 1287967) Web. Assembly is coming! It will be AWESOME! Get involved!
Bounds Checks + Loops uint 32_t addr = … for(; addr < end; addr++) assert(0 <= addr <= L); (*base + addr) = 0;
Bounds Checks + Loops* uint 32_t addr = … constant stride for(; addr < end; addr++) base assert(0 <= addr <= L); HEAP SLOP base+L+S (*base + addr) = 0; Constant Stride + Slop Allow BC Hoisting * ongoing work
- Slides: 44