WideArea Service Composition Evaluation of Availability and Scalability

Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U. C. Berkeley Provider A Provider B Transcoder Thin Client Cellular Phone Video-on-demand server Text to audio Provider R Email repository Provider Q

Problem Statement and Goals Provider A Video-on-demand server Provider A Provider B Transcoder Thin Client Provider B Goals – Performance: Choose set of service instances – Availability: Detect and handle failures quickly – Scalability: Internetscale operation Problem Statement – Path could stretch across – multiple service providers – multiple network domains – Inter-domain Internet paths: – Poor availability [Labovitz’ 99] – Poor time-to-recovery [Labovitz’ 00] – Take advantage of service replicas Related Work – TACC: composition within cluster – Web-server choice: SPAND, Harvest – Routing around failures: Tapestry, RON We address: wide-area n/w perf. , failure issues for long-lived composed sessions

Is “quick” failure detection possible? • What is a “failure” on an Internet path? – Outage periods happen for varying durations • Study outage periods using traces – 12 pairs of hosts • Berkeley, Stanford, UIUC, UNSW (Aus), TU-Berlin (Germany) • Results could be skewed due to Internet 2 backbone? – Periodic UDP heart-beat, every 300 ms – Study “gaps” between receive-times • Results: – Short outage (1. 2 -1. 8 sec) Long outage (> 30 sec) • Sometimes this is true over 50% of the time – False-positives are rare: • O(once an hour) at most – Similar results with ping-based study using ping-servers – Take away: okay to react to short outage periods, by switching service-level path

UDP-based keep-alive stream HB destination HB source Total time Num. False positives Num. Failures Berkeley UNSW 130: 48: 45 135 55 UNSW Berkeley 130: 51: 45 9 8 Berkeley TU-Berlin 130: 49: 46 27 8 TU-Berlin Berkeley 130: 50: 11 174 8 TU-Berlin UNSW 130: 48: 11 218 7 UNSW TU-Berlin 130: 46: 38 24 5 Berkeley Stanford 124: 21: 55 258 7 Stanford Berkeley 124: 21: 19 2 6 Stanford UIUC 89: 53: 17 4 1 UIUC Stanford 76: 39: 10 74 1 Berkeley UIUC 89: 54: 11 6 5 UIUC Berkeley 76: 39: 40 3 5 Acknowledgements: Mary Baker, Mema Roussopoulos, Jayant Mysore, Roberto Barnes, Venkatesh Pranesh, Vijaykumar Krishnaswamy, Holger Karl, Yun-Shen Chang,

Internet Source Architecture Destination Peering: exchange perf. info. Composed services Logical platform Peering relations, Overlay network Hardware platform Service clusters Location of Service Replicas Application plane Functionalities at the Cluster-Man Finding Overlay Entry/Exit Service cluster: compute cluster capable of running services Service-Level Path Creation, Maintenance, and Recovery Link-State Propagation At-least Perf. -once UDP Meas. Liveness Detection

Evaluation • What is the effect of recovery mechanism on application? – Text-to-Speech application Leg-2 1 End-Client – • 2 Text to audio Two possible places of failure • • Leg-1 Text Source Request-response protocol Data (text, or RTP audio) Keep-alive soft-state refresh Application soft-state (for restart on failure) 20 -node overlay network One service instance for each service Deterministic failure for 10 sec during session Metric: gap between arrival of successive audio packets at the client What is the scaling bottleneck? – – Parameter: #client sessions across peering clusters • Measure of instantaneous load when failure occurs 5000 client sessions in 20 -node overlay network Deterministic failure of 12 different links (12 data-points in graph) Metric: average time-to-recovery

1 Recovery time: 2963 ms Recovery time: 822 ms (quicker than leg-2 due to buffer at textto-audio service) Recovery of Application Session: CDF of gaps>100 ms Recovery time: 10, 000 ms Jump at 350 -400 ms: due to synch. text-to-audio processing (impl. artefac

Average Time-to-Recovery vs. Instantaneous Load • Two services in each path • Two replicas per service • Each data-point is a separate run End-to-End recovery algorithm 2 High variance due to varying path length Load: 1, 480 paths on failed link Avg. path recovery time: 614 ms

Results: Discussion • Recovery after failure (leg-2): 2, 963 = 1, 800 + O(700) + O(450) 1 – 1, 800 ms: timeout to conclude failure – 700 ms: signaling to setup alternate path – 450 ms: recovery of application soft-state: re-process current sentence • Without recovery algorithm: takes as long as failure duration • O(3 sec) recovery – Can be completely masked with buffering – Interactive apps: still much better than without recovery • Quick recovery possible since failure information does not have to propagate across network • 12 th data point (instantaneous load of 1, 480) stresses emulator limits – 1, 480 translates to about 700 simul. paths per clustermanager 2 – In comparison, our text-to-speech implementation can support O(15) clients per machine • Other scaling limits? Link-state floods? Graph computation?

Summary – Good recovery time for real-time applications: O(3 sec) – Good scalability -- minimal additional provisioning for cluster managers • Ongoing work: – Overlay topology issues: how many nodes, peering – Stability issues Feedback, Questions? Presentation made using VMWare Eva aly sis lua An • Service Composition: flexible service creation • We address performance, availability, scalability • Initial analysis: Failure detection -- meaningful to timeout in O(1. 2 -1. 8 sec) • Design: Overlay network of service clusters • Evaluation: results so far gn Desi tio n

Emulation Testbed Rule for 1 2 App Node 1 Lib Emulat or Rule for 1 3 Rule for 3 4 Node 2 Rule for 4 3 Node 4 Operational limits of emulator: 20, 000 pkts/sec, for upto 500 byte pkts, 1. 5 GHz Pentium-4