Power Cost Reduction in Distributed Data Centers Yuan

Power Cost Reduction in Distributed Data Centers Yuan Yao University of Southern California Joint work: Longbo Huang, Abhishek Sharma, Leana. Golubchik and Michael Neely IBM Student Workshop for Frontiers of Cloud Computing 2011 Paper to appear on Infocom 2012 1

Background and motivation • Data centers are growing in number and size… – Number of servers: Google (~1 M) – Data centers built in multiple locations • IBM owns and operates hundreds of data centers worldwide • …and in power cost! – Google spends ~$100 M/year on power – Reduce cost on power while considering Qo. S 2

Existing Approaches • Power efficient hardware design • System design/Resource management – Use existing infrastructure – Exploit options in routing and resource management of data center 3

Existing Approaches • Power cost reduction through algorithm design – Server level: power-speed scaling [Wierman 09] – Data center level: rightsizing [Gandhi 10, Lin 11] – Inter data center level: Geographical load balancing [Qureshi 09, Liu 11] $5/kwh $2/kwh job 4

Our Approach: SAVE • We provide a framework that allows us to exploit options in all these levels Server level Data center level Inter data center level + Job arrived Temporal volatility of power prices = Stoch. Astic power red. Uctionsch. Eme(S AVE) Job served 5

Our Model: data center and workload • M geographically distributed data centers • Each data center contain a front end server and a back end cluster • Workloads Ai(t) (i. i. d) arrive at front end servers and are routed to one of the back end clusters µji(t) 6

Our Model: server operation and cost • Back end cluster of data center i contain Ni servers – Ni(t) servers active • • Service rate of active servers: bi (t) ∈[0, bmax] Power price at data center i: pi(t) (i. i. d) Powerusage at data center i: Power cost at data center i: 7

Our Model: two time scale • The system we model is two time scale – At t=k. T, change the number of active servers Nj(t) – At all time slots, change service rate bj(t) 8

Our Model: summary • Input: power prices pi(t), job arrival Ai(t) • Two time Scale Control Action: • Queue evolution: • Objective: Minimize the time average power cost subject to all constraints on Π, and queue stability 9

SAVE: intuitions • SAVE operates at both front end and back end • Front end routing: – When , choose μij(t)>0 • Back end server management: – Choose small Nj(t) and bj(t) to reduce the power costfj(t) – When is large, choose large Nj(t) and bj(t) to stabilize the queue 10

SAVE: how it works • Front end routing: – In all time slot t, choose μij(t) maximize • Back end server management: Choose V>0 – At time slot t=k. T, choose Nj(t) to minimize – In all time slots τ choose bj(τ) to minimize • Serve jobs and update queue sizes 11

SAVE: performance • Theorem on performance of our approach: – Delay of SAVE ≤ O(V) – Power cost of SAVE ≤ Power cost of OPTIMAL + O(1/V) – OPTIMAL can be any scheme that stabilizes the queues • V controls the trade-off between average queue size (delay) and average power cost. • SAVE suited for delay tolerant workloads 12

Experimental Setup • We simulate data centers at 7 locations – Real world power prices – Possion arrivals • We use synthetic workloads that mimics Map. Reduce jobs • Power Cost Power price Power consumption of active servers Power consumption of servers in sleep Power usage effectiveness 13

Experimental Setup: Heuristics for comparison • Local Computation – Send jobs to local back end • Load Balancing All servers are activated – Evenly split jobs to all back ends • Low Price (similar to [Qureshi 09]) – Send more jobs to places with low power prices • Instant On/Off Unrealistic – Routing is the same as Load Balancing – Data center i tune Ni(t) and bi(t) every time slot to minimize its power cost – No additional cost on activating/putting to sleep servers 14

Experimental Results relative power cost reduction as compared to Local Computation • As V increases, power cost reduction grows from ~0. 1% to ~18% • SAVE is more effective for delay tolerant workloads. 15

Experimental Results: Power Usage • We record the actual power usage (not cost) of all schemes in our experiments • Our approach saves power usage 16

Summary • We propose atwo time scale, non work conserving control algorithm aimed atreducing power costin distributed data centers. • Our work facilitating an explicit power cost vs. delay trade-off • We derive analytical bounds on the time average power cost and service delay achieved by our algorithm • Through simulations we show that our approach can reduce the power cost by as much as 18%, and our approach reduces power usage. 17

Future work • Other problems on power reduction in data centers – Scheduling algorithms to save power – Delay sensitive workloads – Virtualized environment, when migration is available 18

Questions? • Please check out our paper: – "Data Centers Power Reduction: A two Time Scale Approach for Delay Tolerant Workloads” to appear on Infocom 2012 • Contact info: yuanyao@usc. edu http: //www-scf. usc. edu/~yuanyao/ 19