Dominant Resource Fairness Fair Allocation of Multiple Resource

  • Slides: 23
Download presentation
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica University of California, Berkeley

Resource Sharing • Multiple users share the resource from a system • Resource: CPU,

Resource Sharing • Multiple users share the resource from a system • Resource: CPU, memory, storage, etc. • A user may need multiple kinds of resources. • We need to fairly allocate the system resource to the users

What is fair sharing? • N users want to share the system’s CPU •

What is fair sharing? • N users want to share the system’s CPU • Solution: • Allocate each 1/n of the shared resource • ‐ three users want to user CPU • Generalized by max-min fairness • Handles if a user wants less than its fair share • ‐e. g. , user 1 wants no more than 20% CPU

Properties of max-min fairness • Share guarantee • Each user will get at least

Properties of max-min fairness • Share guarantee • Each user will get at least 1/n of the resource • But will get less if she/he demand is less • Strategy‐proof • Users are not better off by asking for more than they need • No lie

Why max-min fairness is not enough • Job scheduling in datacenters is not only

Why max-min fairness is not enough • Job scheduling in datacenters is not only about CPUs • Jobs consume CPU, memory, disk, etc. • Challenge: heterogeneity in resource demands

Problem • How to fairly share multiple resources when users have heterogenous demands on

Problem • How to fairly share multiple resources when users have heterogenous demands on them? • Example • 2 resources: CPUs & mem • User 1 wants <1 CPU, 4 GB> per task • User 2 wants <3 CPU, 1 GB> per task

Model •

Model •

A Natural Policy • User 1 has < 50% of both CPUs and RAM

A Natural Policy • User 1 has < 50% of both CPUs and RAM

Share Guarantee • Every user should get 1/n of at least one resource •

Share Guarantee • Every user should get 1/n of at least one resource • Intuition: • You shouldn’t be worse off than if you ran your own cluster with 1/n of the resources

Strategy-proof • A user should not be able to increase her allocation by lying

Strategy-proof • A user should not be able to increase her allocation by lying about her demand vector • Intuition • Users are incentivized to provide truthful resource requirements

Things need to do • Finding a fair sharing policy that provides • Share

Things need to do • Finding a fair sharing policy that provides • Share guarantee • Strategy‐proof • Max‐min fairness for a single resource has these properties • Generalize it to multiple resource?

Dominant Resource Fairness • A user’s dominant resource is the resource she has the

Dominant Resource Fairness • A user’s dominant resource is the resource she has the biggest share • Example: • Total resources: <10 CPU, 4 GB> • User 1’s allocation: <2 CPU, 1 GB> • Dominant resource is memory as 1/4 (25%) > 2/10 (20%) • A user’s dominant share is the fraction of the dominant resource she is allocated • User 1’s dominant share is 25% (1/4)

Dominant Resource Fairness • Apply max‐min fairness to dominant shares • Equalize the dominant

Dominant Resource Fairness • Apply max‐min fairness to dominant shares • Equalize the dominant share of the users • Example: • Total resources: <9 CPU, 18 GB> • User 1 demand: <1 CPU, 4 GB> dom res: mem • User 2 demand: <3 CPU, 1 GB> dom res: CPU

Online Dominant Resource Scheduler • Whenever there available resources and tasks to run •

Online Dominant Resource Scheduler • Whenever there available resources and tasks to run • Schedule a task to the user with smallest dominant share

An Approach from Economy Community •

An Approach from Economy Community •

Comparison in a toy example • Example • Total resources: <9 CPU, 18 GB>

Comparison in a toy example • Example • Total resources: <9 CPU, 18 GB> • User 1 demand: <1 CPU, 4 GB> dom res: mem • User 2 demand: <3 CPU, 1 GB> dom res: CPU

Evaluation • Micro‐experiments on EC 2 • Evaluate DRF’s dynamic behavior when demands change

Evaluation • Micro‐experiments on EC 2 • Evaluate DRF’s dynamic behavior when demands change • Compare DRF with current Hadoop scheduler • Macro‐benchmark through simulations • Simulate Facebook trace with DRF and current Hadoop scheduler

DRF inside Mesos on EC 2 In the first 2 minutes, job 1 uses

DRF inside Mesos on EC 2 In the first 2 minutes, job 1 uses <1 CPU, 10 GB RAM> per task and job 2 uses <1 CPU, 1 GB RAM> per task. After 2 minutes, the task sizes of both jobs change to <2 CPUs, 4 GB> for job 1 and <1 CPU, 3 GB> for job 2.

DRF vs Hadoop Scheduler • Hadoop Fair Scheduler/capacity/Quincy • Each machine consists of k

DRF vs Hadoop Scheduler • Hadoop Fair Scheduler/capacity/Quincy • Each machine consists of k slots ( e. g. k=14) • Run at most one task per slot • Give jobs ”equal” number of slots • i. e. , apply max‐min fairness to slot‐count

Experiment: DRF vs Slots 80 jobs for each task In 10 mins

Experiment: DRF vs Slots 80 jobs for each task In 10 mins

Experiment: DRF vs Slots 80 jobs for each task In 10 mins

Experiment: DRF vs Slots 80 jobs for each task In 10 mins

Simulation: DRF vs Slots on facebook Traces

Simulation: DRF vs Slots on facebook Traces

Selected Questions • Why is the sharing‐incentive property important? If a user doesn’t know

Selected Questions • Why is the sharing‐incentive property important? If a user doesn’t know it’s obtaining less than what it could get from sharing the resources evenly, does this matter? • How does DRF deals with unutilized resource if the over allocating it?