Green Computing Energy Consumption Optimized Service Hosting Walter

  • Slides: 21
Download presentation
Green Computing: Energy Consumption Optimized Service Hosting Walter Binder University of Lugano, Switzerland Niranjan

Green Computing: Energy Consumption Optimized Service Hosting Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA

Motivation • Data centers are becoming ubiquitous Large installations of computer systems Providing critical

Motivation • Data centers are becoming ubiquitous Large installations of computer systems Providing critical services • Data centers are big power consumers Continuously operating computers, regardless of the load 2009 -01 -26 Cooling 2

Reducing Power Consumption • Green Grid consortium advocates data center design and management to

Reducing Power Consumption • Green Grid consortium advocates data center design and management to improve energy efficiency • Right-sizing data centers at design time • Energy-efficient cooling • Virtualization (multiple servers on same physical machine) • Processor power saving (e. g. , clock rate depending on load) 2009 -01 -26 • Powering down unused machines Computers with dedicated roles (e. g. , computers performing backups) 3

Our Approach • Load on machines varies over time • Turn off subset of

Our Approach • Load on machines varies over time • Turn off subset of unnecessary machines, respectively restart machines according to load • Problems Load is distributed over multiple machines Load reduction typically also distributed across multiple machines Need to consolidate load on a subset of machines in order to free up machines that can be turned off 2009 -01 -26 • Goal: Minimum number of machines running • Constraint: Qo. S must be ensured Service-Level Agreements (SLAs) must not be violated 4

2009 -01 -26 Example 5

2009 -01 -26 Example 5

Service Types • Hosting environment may offer multiple service types • Service type consists

Service Types • Hosting environment may offer multiple service types • Service type consists of Service interface SLA defining Qo. S parameters • SLA parameters specified according 2009 -01 -26 to a common ontology WS-Agreement, WSLA, SLAng, etc. Here: Single Qo. S parameter: Response time 6

Stateless versus Stateful Services • Stateless service: Requests are independent After completing all pending

Stateless versus Stateful Services • Stateless service: Requests are independent After completing all pending requests, a stateless service may be stopped • Stateful service: Requests in one session may depend on prior requests in the same session 2009 -01 -26 Sessions may be explicitly terminated by clients, or expire after some period of inactivity After termination of all sessions, a stateful service may be stopped 7

Hosting Environment (1) • Dedicated machines for three different purposes: File servers • Provide

Hosting Environment (1) • Dedicated machines for three different purposes: File servers • Provide all data sources Compute servers • Execute service requests Dispatchers • Receive service requests and choose compute servers to handle them • Decide on shutdown and restart of compute servers • Dispatchers and file servers are continuously 2009 -01 -26 running • Only idle compute servers may be shut down 8

Hosting Environment (2) Clients Dispatcher dispatch File servers data access 2009 -01 -26 requests

Hosting Environment (2) Clients Dispatcher dispatch File servers data access 2009 -01 -26 requests Compute servers 9

Hosting Environment (3) • Heterogeneous environment Machines have different computing resources • Dynamically changing

Hosting Environment (3) • Heterogeneous environment Machines have different computing resources • Dynamically changing environment New machines may be added Cores may fail • Compute servers may host any number of service types, and a service type may be hosted by any number of compute servers 2009 -01 -26 • Compute servers are ranked according to energy efficiency 10

Node Manager • Each compute server runs a Node Manager component • Monitors idle

Node Manager • Each compute server runs a Node Manager component • Monitors idle time and average response time for each service type • Communicates measurements to dispatcher • Handles server shutdown upon 2009 -01 -26 request from dispatcher • Notifies dispatcher upon startup 11

Shutdown of Compute Severs • Dispatcher notifies Node Manager on compute server to prepare

Shutdown of Compute Severs • Dispatcher notifies Node Manager on compute server to prepare shutdown • No further service requests are dispatched to the compute server • Node Manager waits for Completion of all previously accepted requests 2009 -01 -26 Termination of all active sessions • Alternative: Migration of sessions 12

Shutdown Options • Complete shutdown No power consumption Ensures clean state upon restart (e.

Shutdown Options • Complete shutdown No power consumption Ensures clean state upon restart (e. g. , no memory leaks) Slow restart • Hibernation No power consumption Memory saved on persistent storage Resume by reloading memory snapshot • Standby Reduced power consumption 2009 -01 -26 Processor stopped, but memory remains active Fast restart 13

Restart of Compute Servers • Wake on LAN • Magic packet is broadcast to

Restart of Compute Servers • Wake on LAN • Magic packet is broadcast to LAN Special header: 0 x. FF repeated 6 times MAC address of the machine to restart • Dispatcher initiates compute server restart • Node Manager notifies dispatcher of completed restart • Dispatcher needs to know MAC addresses 2009 -01 -26 of all compute servers 14

Service Dispatch: Definitions • n compute servers <s 1, …, sn> • Sorted according

Service Dispatch: Definitions • n compute servers <s 1, …, sn> • Sorted according to energy efficiency sx more energy efficient than sy x < y • In each configuration s 1 … sr are running (1 ≤ r ≤ n) 2009 -01 -26 sr … sn are shut down (or in the process of shutting down) • p. T(i): probability that request for service type T is dispatched to si 15

Service Dispatch upon Request • Take a random number z (0 ≤ z ≤

Service Dispatch upon Request • Take a random number z (0 ≤ z ≤ 1; uniform distribution) • Choose sc such that c = min { i: (1 ≤ i ≤ n) && (z ≤ sum(1; i; p. T(i))) } 2009 -01 -26 • Related to lottery scheduling Tickets instead of probabilities 16

Update of Probabilities (1) • In regular intervals, dispatcher obtains monitoring data from Node

Update of Probabilities (1) • In regular intervals, dispatcher obtains monitoring data from Node Managers of running compute servers • If si had idle time and si had no problem meeting the SLAs: Increase load on si, reduce load on sr p. T(r) : = p. T(r) – Δp p. T(i) : = p. T(i) + Δp 2009 -01 -26 • If r > 1 and for all service types T p. T(r) = 0, initiate shutdown of sr 17

Update of Probabilities (2) • If compute server si violates the SLA for a

Update of Probabilities (2) • If compute server si violates the SLA for a service type T (overload situation): First try to find a running compute server sk (1 ≤ k ≤ r) that has idle time and met the SLAs of all service types • Balance load between si and sk • p. T(i) : = p. T(i) – Δp 2009 -01 -26 • p. T(k) : = p. T(k) + Δp If there is no such compute server sk, initiate restart of sr+1 18

Future Work (1) • Testbed and evaluation Main evaluation metric: Energy savings for given

Future Work (1) • Testbed and evaluation Main evaluation metric: Energy savings for given workloads Service performance must be modeled Traces of service execution in data centers needed • Migration of sessions Reduces the time for preparing shutdown 2009 -01 -26 • Complex optimization criteria Minimize number of service types hosted on the same compute server Consider estimated shutdown preparation time when choosing the compute server to shut down 19

Future Work (2) • Distribution and replication Service dispatcher must not become bottleneck •

Future Work (2) • Distribution and replication Service dispatcher must not become bottleneck • Fault tolerance Dispatcher must detect compute server failures Dispatcher must not become single point of failure • Sudden load fluctuations 2009 -01 -26 Shutting down machines increases vulnerability wrt. denial-of-service attacks 20

Conclusions • Data centers are growing and consume huge amounts of electrical energy •

Conclusions • Data centers are growing and consume huge amounts of electrical energy • Energy can be saved by powering down unused machines according to the current load • Requires consolidation of services on a subset of the available machines 2009 -01 -26 • Probabilistic approach to energy consumption-aware load-balancing 21