Scalable Reliable Multicast in Wide Area Networks Sneha

Scalable Reliable Multicast in Wide Area Networks Sneha Kumar Kasera Department of Computer Science University of Massachusetts, Amherst

Why Multicast ? one sender three receivers multiple unicast broadcast multicast

Why Reliable Multicast ? applications n n n one-to-many file transfer information updates (e. g. , stock quote, web cache updates) shared whiteboard lossy network multicast

Goal design, evaluate multicast loss recovery approaches that n n make efficient use of end-host, network resources scale to several thousand receivers spanning wide area networks

Feedback Implosion sender pkt ACK loss ACK NAK receivers receiver pkt problem: ACK implosion NAK implosion ? n NAK suppression (using timers) n NAK aggregation (by building hierarchy) receivers solution: use NAKs

sender original transmission pkt retransmission lost unicast loss multicast

Problem of Retransmission Scoping l sender retransmission loss original transmission loss l if same channel for retransmissions, retransmissions go everywhere how to shield receiver from loss recovery due to other receivers ?

Loss Recovery Burden l sender retransmits pkts 1, 2, 3, 4 l pkt 1 loss pkt 4 loss pkt 3 loss pkt 2 loss l when #receivers large, each pkt lost at some rcvr with high probability sender retransmits almost all pkts several times how to share burden of loss recovery ?

Thesis Contributions l l scoping retransmissions using multiple multicast channels server-based local recovery n n l performance benefits resource requirements “active” repair services n signaling for locating, invoking, revoking services n router support

Overview l l scoping retransmissions using multiple multicast channels server-based local recovery n n l l performance benefits resource requirements “active” repair services n signaling for locating, invoking, revoking services n router support summary and future directions

Scalable Reliable Multicast Using Multiple Multicast Channels l l l one channel for original transmissions, Aorig sender additional channels for retransmissions, pkt k sent on Ak on detecting loss of pkt k, receiver n n n Ak Aorig loss joins Ak recovers packet k leaves Ak Kasera, Kurose, Towsley, ACM SIGMETRICS Conference ‘ 97 loss

Issues l how much is performance improved ? n n l receiver, sender processing network bandwidth if (multicast channel IP multicast group), realistically only finite channels available ! l overhead of join, leave operations ? l router support for multiple multicast channels ?

Analysis l rcvrs unicast NAKs to sender l infinite channels available l system model n n n l one sender, R receivers independent loss, probability p NAKs not lost E[Y] = E[Yp] + p. E[Yj] + p. E[Yn]/(1 -p) + p 2 E[Yt]/(1 -p) rcvd pkt processing l Y = total per pkt rcv proc time Yp = rcvd pkt proc time Yj = join, leave proc time Yn = NAK proc time Yt = timer proc time join, leave processing NAK processing timer processing determined various proc times by instrumenting Linux kernel

Receiver Processing Cost Reduction l l l considerable reduction in rcvr processing costs by using infinite channels example: when R = 1000, p = 0. 05, processing cost reduces by approx. 65% similar behavior observed for protocols that multicast NAKs for NAK suppression

Finite # of Retransmission Channels l l recycle G retransmission channels, retransmit pkt k on Ak mod G example, G = 3 transmit retransmit pkt 1 on (1) 1 2 lost at r 1 1 3 4 5 lost at r 2 1 1 lost at r 1 retransmit pkt 4 on (1) 1 1 received at r 1 4 4 lost at r 2 received at r 1

Finite # of Retransmission Channels l l find #unwanted pkts, U, at receiver due to using G channels only model n n n same as before transmit with interval D retransmit with interval D’ (if pending NAK) l U depends upon G, p, R, D/D’ l receiver processing cost, E[Y’] = E[Y] + E[U]E[Yp] unwanted pkt processing

How many channels do we need ? l l l find minimum #channels s. t. increase in cost within 1% small #channels for wide range of p, D/D’, R #channels n <= 10 when D/D’ >= 0. 5 n sensitive to low D/D’

Summary (part 1) l l use of multiple multicast channels reduces receiver processing small to moderate #channels achieve almost perfect retransmission scoping implementation using router support also saves network bandwidth sender still bottleneck, no improvement in protocol performance

Local Recovery l l l server and/or other receivers aid in loss recovery distribution of loss recovery burden transmission loss server possible reduction in n n l sender network bandwidth recovery latency retransmission scoping local domains loss

Overview l l scoping retransmissions using multiple multicast channels server-based local recovery n n l l performance benefits resource requirements “active” repair services n signaling for locating, invoking, revoking services n router support summary and future directions

Repair Server Based Local Recovery l l repair servers co-located with routers at strategic locations placement of application level repair service in routers repair servers cache recent pkts receivers, repair servers, recover lost pkts from upper level repair servers, sender repair server receivers Kasera, Kurose, Towsley, IEEE INFOCOM ‘ 98

Issues l how much is performance improved over traditional local recovery approaches ? n n SRM: dynamically elect receiver for every loss RMTP, LBRM: designated receiver, logger for supplying repairs l where to place repair servers ? l what are repair server resource requirements ?

System Model sender l l source link backbone l l tail site receivers local domain based on [YKT ‘ 97] loss free backbone, sites loss at source link, tails temporally independent loss, probability p

System Model sender l l source link backbone l l tail site receivers local domain based on [YKT ‘ 97] loss free backbone, sites loss at source link, tails temporally independent loss, probability p designated receiver

System Model sender l l source link backbone repair server l l tail site receivers local domain based on [YKT ‘ 97] loss free backbone, sites loss at source link, tails temporally independent loss, probability p

Performance Evaluation l metrics n n l throughput = 1/max(sender-processing time, receiverprocessing time) bandwidth usage = total bytes transmitted over all links per correct transmission analysis: similar approach as in previous problem (optimistic bounds for SRM)

Performance Comparison repair server-based (RSB) compared to n n SRM: throughput upto 2. 5 times, bandwidth reduction 60% DR-based (DRB): throughput upto 4 times, bandwidth reduction 35%

Insufficient Repair Servers l l l additional sender retransmission required if some domains without repair servers place repair servers in high loss domains first homogeneous loss: high % domains require repair server l 20% tail loss in 20% domains, 1% tail loss in 80% domains

Repair Server Buffer Requirements (per session) theoretically: infinite realistically: allot finite buffers l replace pkts when buffers full l l if required, replaced pkts recovered upstream size depends upon n amount of upstream recovery pkt arrival process, buffer holding time replacement policy example: when p = 0. 05, 15 buffers ensure almost perfect local recovery

Buffer Replacement Policies l examine three policies n n l l FIFO, LRU FIFO-MH: FIFO with minimum buffer holding time = one retransmission interval FIFO-MH shows little improvement over FIFO LRU performs better than FIFO only when #buffers large example: n n arrival rate = 128 pkts/sec retransmission interval from round trip time traces

Summary (part 2) l l repair server-based approach exhibits superior performance over traditional approaches repair server placement - above loss, higher loss domains first buffer requirement n several 10 s of buffers (per session) n simple FIFO replacement policy sufficient how to make repair server approach dynamic ?

Overview l l scoping retransmissions using multiple multicast channels server-based local recovery n n l l performance benefits resource requirements “active” repair services n signaling for locating, invoking, revoking services n router support summary and future directions

Active Repair Service l l repair server functionality as active repair service design repair service-based protocol, AER locate, invoke repair services using source path messages (SPMs) minimal router support required for interception of SPM, subcast S SPM RS 1 SPM l l RS 2 SPMs multicast but intercepted NAKs take reverse path

Thesis Contributions l l scoping retransmissions using multiple multicast channels server-based local recovery n n l performance benefits resource requirements “active” repair services n signaling for locating, invoking, revoking services n router support

Future Directions l model cost of additional network resources l buffer requirements n n multiple sessions other applications (e. g. , web caching) l composable multicast services l other multicast research n n n revisit IP multicast service model congestion control pricing

Composable Multicast Services (Work in Progress) l n n n l sender protocol identify performance enhancing services, examples feedback aggregation selective forwarding repair, rate conversion, log services invoke/revoke services based on n n rate conversion application requirements network conditions rcvr protocol feedback aggregation rcvr protocol issues: n n n implementing composability signaling mechanism (SPM++) measurement-based infrastructure