ResourceFreeing Attacks Improve Your Cloud Performance at Your

Public Clouds (EC 2, Azure, Rackspace, …) VM Multi-tenancy Different customers’ virtual machines (VMs)

Implications of Multi-tenancy – CPU, cache, memory, disk, network, etc. • Virtual Machine Managers

Contention in Xen 3 x-6 x Performance loss Higher cost Performance Degradation (%) 600

What can a tenant do? Ask provider for better isolation … requires overhaul of

Resource-freeing attacks (RFAs) • What is an RFA? • RFA case studies 1. Two

The Setting Victim: – One or more VMs – Public interface (eg, http) Beneficiary:

Example: Network Contention • What can you do? Local Xen Test bed Beneficiary Victim

Ways to Reduce Contention? Break into victim VM and disable it Helper The good:

Ways to Reduce Contention? Backfires: May increase the contention Victim Clients Net SYN flood

Recipe for a Successful RFA Proportion of CPU usage Push towards CPU bottleneck Shift

An RFA in Our Example Result in our testbed: Increases beneficiary’s share of bandwidth

Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target

Cache Performance Degradation (%) Cache Contention 250 RFA Goal 200 150 100 50 0

Case Study: Cache vs. Network – ~3 x slower when sharing cache with webserver

Cache vs. Network Victim webserver frequently interrupts, pollutes the cache – Reason: Xen gives

Cache vs. Network w/ RFA Beneficiary starts to run cache state Webserver Heavily loaded

RFA: Performance Improvement RFA intensities – time in ms per second 60% Performance Improvement

RFA Effect on Interruptions Beneficiary: LLCProbe 40% 85% + x 19

RFA Effect on Victim’s capacity Decreases with increasing RFA intensity 20

Experiments on Amazon EC 2 Multiple Accounts VM VM Co-resident VMs from our accounts:

LLCProbe Synthetic Benchmark Highest performance improvement of 13%, recovering 33% of performance lost. Average

mcf from SPEC-CPU 3% performance improvement = 35% reduction in performance loss 10% slowdown

Discussion: Practical Aspects RFA case studies used CPU intensive CGI requests – Alternative: Do.

Conclusion Resource-Freeing Attacks – Interfere with victim to shift resource use – Proof-of-concept of

References [MMSys 10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application

Discussion: Countermeasures Detection? – May be hard to differentiate RFA from legitimate Stricter Isolation?

Discussion: Economies • Cost of RFA – Helper instance, and – RFA traffic. •

Identifying Co-resident VMs • Identifying the public interface: – Predictable numerical distance between internal

Experiment: Measuring Resource Contention • Synthetic workloads 31

Other RFAs • RFAs are not limited to the presented case studies. • LLC

Discussion: More on Practical Aspects • Work-conserving vs. Non-work-conserving schedulers – It is expected

• Domain-0 – Privileged Domain, direct access to I/O devices. – All I/O

Experiment: Measuring Resource Contention Machine 600 Packages 500 LLC Size 400 Intel Xeon E

Boost Priority and Interruptions Victim: Webserver Beneficiary: LLCProbe 40% 95% 85% < 30% Fewer

Demonstration on EC 2 • Problem #1: Achieving Co-residence – Launching multiple instances simultaneously

Normalized Performance on EC 2 Aggregate performance degradation is within 5 performance points On

Slides: 38

Download presentation

Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense) (Venkat)anathan Varadarajan, Thawan Kooburat, Benjamin Farley, Thomas Ristenpart, and Michael Swift DEPARTMENT OF COMPUTER SCIENCES 1

Public Clouds (EC 2, Azure, Rackspace, …) VM Multi-tenancy Different customers’ virtual machines (VMs) share same server VM VM VM Why multi-tenancy? Improved resource utilization 2

Implications of Multi-tenancy – CPU, cache, memory, disk, network, etc. • Virtual Machine Managers (VMM) – Goal: Provide Isolation VMM • VMs share many resources VM VM • Deployed VMMs don’t perfectly isolate VMs – Side-channels [Ristenpart et al. ’ 09, Zhang et al. ’ 12] Today: Performance degraded by other customers 3

Contention in Xen 3 x-6 x Performance loss Higher cost Performance Degradation (%) 600 500 400 Work-conserving scheduling VM VM 300 200 Local Xen Testbed 100 0 CPU Net Non-work-conserving CPU scheduling Disk Cache Machine Intel Xeon E 5430, 2. 66 Ghz CPU 2 packages each with 2 cores Cache Size 6 MB per package 4

What can a tenant do? Ask provider for better isolation … requires overhaul of the cloud VM Pack up VM and move (See our SOCC 2012 paper) … but, not all workloads cheap to move VM This work: Greedy customer can recover performance by interfering with other tenants Resource-Freeing Attack 5

Resource-freeing attacks (RFAs) • What is an RFA? • RFA case studies 1. Two highly loaded web server VMs 2. Last Level Cache (LLC) bound VM and highly loaded webserver VM • Demonstration on Amazon EC 2 6

The Setting Victim: – One or more VMs – Public interface (eg, http) Beneficiary: – VM whose performance we want to improve Helper: Victim VM VM Beneficiary – Mounts the attack Beneficiary and victim fighting over a target resource Helper 7

Example: Network Contention • What can you do? Local Xen Test bed Beneficiary Victim Clients Net 8

Ways to Reduce Contention? Break into victim VM and disable it Helper The good: frees up resources used by victim Local Xen Test bed But: • Requires knowledge of vulnerability • Drastic • Easy to detect Beneficiary Victim Clients Net 9

Ways to Reduce Contention? Backfires: May increase the contention Victim Clients Net SYN flood This may NOT free up target resources Beneficiary Local Xen Test bed Do a simple Do. S attack? Helper 10

Recipe for a Successful RFA Proportion of CPU usage Push towards CPU bottleneck Shift resource away from the target resource towards the bottleneck resource CPU intensive dynamic pages Shift resource usage via public interface Limits Static pages Proportion of Network usage Reduce target resource usage 11

An RFA in Our Example Result in our testbed: Increases beneficiary’s share of bandwidth CPU Utilization Clients No RFA: 1800 page requests/sec W/ RFA: 3026 page requests/sec CGI Request 50% 85% share of bandwidth Net Helper 12

Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck Can we mount RFAs when target resource is CPU cache? Shared CPU Cache: – Ubiquitous: Almost all workloads need cache – Hardware controlled: Not easily isolated via software – Performance Sensitive: High performance cost! 13

Cache Performance Degradation (%) Cache Contention 250 RFA Goal 200 150 100 50 0 1000 2000 3000 Webserver Request Rate 14

Case Study: Cache vs. Network – ~3 x slower when sharing cache with webserver Local Xen Test bed • Victim : Apache webserver hosting static and dynamic (CGI) web pages • Beneficiary: Synthetic cache bound workload (LLCProbe) Beneficiary Victim • Target Resource: Cache $$$ • No cache isolation: Core Clients Core Net Cache 15

Cache vs. Network Victim webserver frequently interrupts, pollutes the cache – Reason: Xen gives higher priority to VM consuming less CPU time Beneficiary starts to run $$$ Core Clients Core Cache Net decreased cache efficiency cache state Webserver receives a request Cache state time line Heavily loaded web server 16

Cache vs. Network w/ RFA Beneficiary starts to run cache state Webserver Heavily loaded webserver requests under RFA receives a web server request $$$ Core Clients Core Cache Net CGI Request RFA helps in two ways: 1. Webserver loses its priority. 2. Reducing the capacity of webserver. Cache state time line Helper 17

RFA: Performance Improvement RFA intensities – time in ms per second 60% Performance Improvement 196% slowdown 86% slowdown 18

RFA Effect on Interruptions Beneficiary: LLCProbe 40% 85% + x 19

RFA Effect on Victim’s capacity Decreases with increasing RFA intensity 20

Experiments on Amazon EC 2 Multiple Accounts VM VM Co-resident VMs from our accounts: Stand-ins for victim and beneficiary VM VM Separate instances for helper and web clients Instance type m 1. small # of co-resident pairs 9 (23 total instances) Machine type Intel Xeon E 5507 with 4 MB LLC No direct interact with any other customers Indirect interaction akin to normal usage cases 21

LLCProbe Synthetic Benchmark Highest performance improvement of 13%, recovering 33% of performance lost. Average performance improvement: 6% RFA improved performance of LLCProbe on all experimental EC 2 instances! 22

mcf from SPEC-CPU 3% performance improvement = 35% reduction in performance loss 10% slowdown 6% slowdown On average RFA improved performance across all SPEC workloads! 23

Discussion: Practical Aspects RFA case studies used CPU intensive CGI requests – Alternative: Do. S vulnerabilities (Eg. hash-collision attacks) Identifying co-resident victims – Easy on most clouds (Co-resident VMs have predictable internal IP addresses) VM VM No public interface? – Paper discusses possibilities for RFAs 24

Conclusion Resource-Freeing Attacks – Interfere with victim to shift resource use – Proof-of-concept of efficacy in public clouds VM VM Open questions: – Other RFAs? – Countermeasures: Detection, stricter isolation, smarter scheduling? 25

References [MMSys 10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application performance in the cloud. ” In MMSys, 2010. [Security 10] Thomas Moscibroda and Onur Mutlu. “Memory performance attacks: Denial of memory service in multi-core systems. ” In Usenix Security Symposium, 2007. [CCS 09] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. “Hey, you, get off my cloud: exploring information leakage in third party compute clouds. ” In CCS, 2009. 26

Backup Slides 27

Discussion: Countermeasures Detection? – May be hard to differentiate RFA from legitimate Stricter Isolation? – Works but expensive Contention-aware scheduling – Not yet used in public Iaa. S 28

Discussion: Economies • Cost of RFA – Helper instance, and – RFA traffic. • Co-resident helper – An efficient implementation of helper can run inside the attacker’s VM. – Current helper implementation consumes 15 Kbps of network bandwidth and a CPU utilization of 0. 7%. • Multiplex Singe Helper Instance for many beneficiaries. • Note: Currently, internal EC 2 network traffic is free-ofcost. 29

Identifying Co-resident VMs • Identifying the public interface: – Predictable numerical distance between internal IP addresses in public clouds. – Identifying port used by the victim application (standard ports like http(s), etc. ). 30

Experiment: Measuring Resource Contention • Synthetic workloads 31

Other RFAs • RFAs are not limited to the presented case studies. • LLC vs. Disk – Sending spurious, random disk requests asynchronously to create a bottleneck for the shared disk resource. • Memory vs. Disk – Similarly to the above RFA 32

Discussion: More on Practical Aspects • Work-conserving vs. Non-work-conserving schedulers – It is expected that public cloud environment manage resources in a non-work-conserving fashion. – Eg. Net vs. Net RFA won’t work on Amazon EC 2. • Simulated client workload – What is the effect of RFA in the presence of multiple independent client requests originating from numerous clients? 33

• Domain-0 – Privileged Domain, direct access to I/O devices. – All I/O requests goes through Dom-0 • Xen scheduler internal – Boost priority for interactive workloads VM VM Dom 0 Incoming request Xen Internals Hypervisor Core N/W cache memory Disk 34

Experiment: Measuring Resource Contention Machine 600 Packages 500 LLC Size 400 Intel Xeon E 5430, 2. 66 Ghz 2, 2 cores per package 6 MB per package 300 Local Xen Test bed Performance Degradation (%) • On a local Xen test bed Some have huge performance VM degradation VM Core VM VM VM Core Observed Workloads: Core N/W LLC Not all resources conflict 200 VM LLC CPU Net Disk memory Memory Cache 100 Disk 0 CPU Net Disk Memory Cache Conflicting Workloads 35

Boost Priority and Interruptions Victim: Webserver Beneficiary: LLCProbe 40% 95% 85% < 30% Fewer interruptions Higher cache efficiency 36

Demonstration on EC 2 • Problem #1: Achieving Co-residence – Launching multiple instances simultaneously from two or more accounts. • Problem #2: Verifying Co-residency – Numerical distance between internal IP addresses [CCS 09]. – Faster packet round-trip times. – Using resource contention experiments. 37

Normalized Performance on EC 2 Aggregate performance degradation is within 5 performance points On an average all SPEC workloads benefitted from RFA Baseline Higher is better 6% 38