INF 5071 Performance in distributed systems PeertoPeer Systems

INF 5071 – Performance in distributed systems: Peer-to-Peer Systems 27/10 & 10/11 – 2006

Client-Server § § § Traditional distributed computing Successful architecture, and will continue to be so (adding proxy servers) Tremendous engineering necessary to make server farms scalable and robust backbone network local distribution network INF 5071 – performance in distributed systems local distribution network 2006 Carsten Griwodz & Pål Halvorsen

Distribution with proxies § Hierarchical distribution system completeness of available content root servers § E. g. proxy caches that consider popularity § Popular videos replicated and kept close to clients § Unpopular ones close to the root servers § Popular videos are replicated more frequently regional servers local servers end-systems INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Peer-to-Peer (P 2 P) § Really an old idea - a distributed system architecture § § § No centralized control Nodes are symmetric in function Typically, many nodes, but unreliable and heterogeneous backbone network local distribution network INF 5071 – performance in distributed systems local distribution network 2006 Carsten Griwodz & Pål Halvorsen

Overlay networks Overlay node Overlay link IP link backbone network LAN IP path INF 5071 – performance in distributed systems IP routing 2006 Carsten Griwodz & Pål Halvorsen

P 2 P § Many aspects similar to proxy caches § § § § Nodes act as clients and servers Distributed storage Bring content closer to clients Storage limitation of each node Number of copies often related to content popularity Necessary to make replication and de-replication decisions Redirection § But § No distinguished roles § No generic hierarchical relationship § At most hierarchy per data item § Clients do not know where the content is § May need a discovery protocol § All clients may act as roots (origin servers) § Members of the P 2 P network come and go INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

P 2 P Systems § Peer-to-peer systems § New considerations for distribution systems § Considered here § § Scalability, fairness, load balancing Content location Failure resilience Routing § Application layer routing § Content routing § Request routing § Not considered here § Copyright § Privacy § Trading INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Napster

Napster § Program for sharing (music) files over the Internet § Approach taken § Central index § Distributed storage and download § All downloads are shared § P 2 P aspects § Client nodes act also as file servers INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Napster § Client connects to Napster with login and password § Transmits current listing of join shared files central index § Napster registers username, maps username to IP address and records song list INF 5071 – performance in distributed systems . . . 2006 Carsten Griwodz & Pål Halvorsen

Napster § Client sends song request to Napster server § Napster checks song database query central index answer § Returns matched songs with usernames and IP addresses (plus extra stats) INF 5071 – performance in distributed systems . . . 2006 Carsten Griwodz & Pål Halvorsen

Napster § User selects a song, download request sent straight to user § Machine contacted if available get file central index . . . INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Napster: Assessment § Scalability, fairness, load balancing § Replication to querying nodes § Number of copies increases with popularity § § Large distributed storage Unavailability of files with low popularity Network topology is not accounted for at all Latency may be increased § Content location § Simple, centralized search/location mechanism § Can only query by index terms § Failure resilience § No dependencies among normal peers § Index server as single point of failure INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Gnutella

Gnutella § Program for sharing files over the Internet § Approach taken § § Purely P 2 P, centralized nothing Dynamically built overlay network Query for content by overlay broadcast No index maintenance § P 2 P aspects § Peer-to-peer file sharing § Peer-to-peer querying § Entirely decentralized architecture § Many iterations to fix poor initial design (lack of scalability) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Joining § Connect to one known host and send a broadcast ping § Can be any host, hosts transmitted through word-of-mouth or hostcaches § Use overlay broadcast ping through network with TTL of 7 TTL 4 INF 5071 – performance in distributed systems TTL 3 TTL 2 TTL 1 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Joining § Hosts that are not overwhelmed respond with a routed pong § Gnutella caches these IP addresses or replying nodes as neighbors § In the example the grey nodes do not respond within a certain amount of time (they are overloaded) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Query § Query by broadcasting in the overlay qu qu er qu qu e ry er y TTL: 7 ery INF 5071 – performance in distributed systems y ery que § Up to 7 layers deep (TTL 7) TTL: 6 qu § Overlay neighbors forward query to all their neighbors query § Send query to all overlay neighbors ry query 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Query § Send routed responses re que sp ry on se § To the overlay node that was the source of the broadcast query § Querying client receives several responses INF 5071 – performance in distributed systems que resp ry onse y er e qu ons sp re § User receives a list of files that matched the query and a corresponding IP address query e ns respo 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Transfer § File transfer protocol not part of the Gnutella specification INF 5071 – performance in distributed systems download requested file § Using direct communication 2006 Carsten Griwodz & Pål Halvorsen

Gnutella: Assessment § Scalability, fairness, load balancing § Replication to querying nodes § Number of copies increases with popularity § § Large distributed storage Unavailability of files with low popularity Bad scalability, uses flooding approach Network topology is not accounted for at all, latency may be increased § Content location § No limits to query formulation § Less popular files may be outside TTL § Failure resilience § No single point of failure § Many known neighbors § Assumes quite stable relationships INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Freenet

Freenet § Program for sharing files over the Internet § Focus on anonymity § Approach taken § § § Purely P 2 P, centralized nothing Dynamically built overlay network Query for content by hashed query and best-first-search Caching of hash values and content Content forwarding in the overlay § P 2 P aspects § § Peer-to-peer file sharing Peer-to-peer querying Entirely decentralized architecture Anonymity INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Freenet: Nodes and Data § Nodes § Routing tables § Contain IP addresses of other nodes and the hash values they hold (resp. held) § Data is indexed with a hash values § “Identifiers” are hashed § Identifiers may be keywords, author ids, or the content itself § Secure Hash Algorithm (SHA-1) produces a “one-way” 160 bit key § Content-hash key (CHK) = SHA-1(content) § Typically stores blocks INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Freenet: Storing and Retrieving Data § Storing Data § Data is moved to a server with arithmetically close keys 1. The key and data are sent to the local node 2. The key and data is forwarded to the node with the nearest key Repeat 2 until maximum number of hops is reached § Retrieving data § Best First Search 1. An identifier is hashed into a key 2. The key is sent to the local node 3. If data is not in local store, the request is forwarded to the best neighbor Repeat 3 with next best neighbor until data found, or request times out 4. If data is found, or hop-count reaches zero, return the data or error along the chain of nodes (if data found, intermediary nodes create entries in their routing tables) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Freenet: Best First Search query . . . ? § Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Freenet: Routing Algorithm INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Freenet: Assessment § Scalability, fairness, load balancing § Caching in the overlay network § Access latency decreases with popularity § Large distributed storage § Fast removal of files with low popularity § A lot of storage wasted on highly popular files § Network topology is not accounted for § Content location § Search by hash key: limited ways to formulate queries § Content placement changes to fit search pattern § Less popular files may be outside TTL § Failure resilience § No single point of failure INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Fast. Track, Morpheus, Open. FT

Fast. Track, Morpheus, Open. FT § Peer-to-peer file sharing protocol § Three different nodes § USER § Normal nodes § SEARCH § Keep an index of “their” normal nodes § Answer search requests § INDEX § Keep an index of search nodes § Redistribute search requests INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Fast. Track, Morpheus, Open. FT USER SEARCH INDEX INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Fast. Track, Morpheus, Open. FT USER SEARCH ! ! INDEX INF 5071 – performance in distributed systems ? 2006 Carsten Griwodz & Pål Halvorsen

Fast. Track, Morpheus, Open. FT: Assessment § Scalability, fairness, load balancing § § § Large distributed storage Avoids broadcasts Load concentrated on super nodes (index and search) Network topology is partially accounted for Efficient structure development § Content location § Search by hash key: limited ways to formulate queries § All indexed files are reachable § Can only query by index terms § Failure resilience § No single point of failure but overlay networks of index servers (and search servers) reduces resilience § Relies on very stable relationship / Content is registered at search nodes § Relies on a partially static infrastructure INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Bit. Torrent

Bit. Torrent § Distributed download system § Content is distributed in segments § Tracker § One central download server per content § Approach to fairness (tit-for-tat) per content § No approach for finding the tracker § No content transfer protocol included INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Bit. Torrent Segment download operation Ø Tracker tells peer source and number of segment to get Ø Peer retrieves content in pull mode Ø Peer reports availability of new segment to tracker Tracker INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Bit. Torrent No second input stream: not contributed enough Tracker Rarest first strategy INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Bit. Torrent No second input stream: not contributed enough All nodes: max 2 concurrent streams in and out Tracker INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Bit. Torrent Tracker INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Bit. Torrent Assessment § Scalability, fairness, load balancing § § § Large distributed storage Avoids broadcasts Transfer content segments rather than complete content Does not rely on clients staying online after download completion Contributors are allowed to download more § Content location § Central server approach § Failure resilience § Tracker is single point of failure § Content holders can lie INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Comparison Napster Scalability Routing information Lookup cost Gnutella Limited by flooding One central server O(1) Physical locality INF 5071 – performance in distributed systems Free. Net O(#nodes) Bit. Torrent Separate overlays per file Uses caching Neighbour list O(log(#nodes)) Fast. Track Index server One tracker per file O(1) O(#blocks) By search server assignment 2006 Carsten Griwodz & Pål Halvorsen

Comparison Napster Load balancing Content location Many replicas of popular content All files reachable Uses index server Search by index term Failure resilience Gnutella Index server as single point of failure INF 5071 – performance in distributed systems Free. Net Content Load placement concentrated changes to fit on supernodes search Unpopular files may be outside TTL Uses flooding Fast. Track Bit. Torrent Rarest first copying All files reachable Search by hash External issue Overlay network of index servers Tracker as single point of failure Search by hash No single point of failure 2006 Carsten Griwodz & Pål Halvorsen

Peer-to-Peer Systems Distributed directories

Examples: Chord

Chord § Approach taken § § Only concerned with efficient indexing Distributed index - decentralized lookup service Inspired by consistent hashing: SHA-1 hash Content handling is an external problem entirely § No relation to content § No included replication or caching § P 2 P aspects § Every node must maintain keys § Adaptive to membership changes § Client nodes act also as file servers INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Lookup Based on Hash Tables lookup(key) data Insert(key, data) key hash function INF 5071 – performance in distributed systems hash table 0 1 2 pos 3. . . N y z Hash bucket 2006 Carsten Griwodz & Pål Halvorsen

Distributed Hash Tables (DHTs) Distributed application Insert(key, data) Lookup (key) data Distributed hash tables node § § § § node …. node Define a useful key nearness metric Keep the hop count small Keep the routing tables “right size” Stay robust despite rapid changes in membership Nodes are the hash buckets Key identifies data uniquely DHT balances keys and data across nodes INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Chord IDs & Consistent Hashing § m bit identifier space for both keys and nodes § Key identifier = SHA-1(key) Key=“Let. It. Be” SHA-1 ID=54 § Node identifier = SHA-1(IP address) IP=“ 198. 10. 1” SHA-1 ID=123 § Both are uniformly distributed § Identifiers ordered in a circle modulo 2 m § A key is mapped to the first node whose id is equal to or follows the key id INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing: Everyone-Knows-Everyone § Every node knows of every other node - requires global information § Routing tables are large – N Where is “Let. It. Be”? “N 56 has K 60” Hash(“Let. It. Be”) = K 54 Requires O(1) hops INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing: All Know Their Successor § Every node only knows its successor in the ring § Small routing table – 1 Where is “Let. It. Be”? “N 56 has K 60” Hash(“Let. It. Be”) = K 54 Requires O(N) hops INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing: “Finger Tables” § Every node knows m other nodes in the ring § Increase distance exponentially § Finger i points to successor of n+2 i INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Joining the Ring § Three step process: § § § Initialize all fingers of new node - by asking another node for help Update fingers of existing nodes Transfer keys from successor to new node INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Handling Failures § Failure of nodes might cause incorrect lookup N 120 N 113 N 102 N 85 Lookup(90) N 80 § N 80 doesn’t know correct successor, so lookup fails § One approach: successor lists § § § Each node knows r immediate successors After failure find first known live successor Increased routing table size INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Chord Assessment § Scalability, fairness, load balancing § § § Large distributed index Logarithmic search effort Network topology is not accounted for Routing tables contain log(#nodes) Quick lookup in large systems, low variation in lookup costs § Content location § § Search by hash key: limited ways to formulate queries All indexed files are reachable Log(#nodes) lookup steps Not restricted to file location § Failure resilience § No single point of failure § Not in basic approach § Successor lists allow use of neighbors to failed nodes § Salted hashes allow multiple indexes § Relies on well-known relationships, but fast awareness of disruption and rebuilding INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Pastry

Pastry § Approach taken § § Only concerned with efficient indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely § No relation to content § No included replication or caching § P 2 P aspects § § Every node must maintain keys Adaptive to membership changes Leaf nodes are special Client nodes act also as file servers INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Pastry § DHT approach § Each node has unique 128 -bit node. Id § Assigned when node joins § Used for routing § § Each message has a key Node. Ids and keys are in base 2 b § b is configuration parameter with typical value 4 (base = 16, hexadecimal digits) § § § Pastry node routes the message to the node with the closest node. Id to the key Number of routing steps is O(log N) Pastry takes into account network locality § Each node maintains § § § Routing table is organized into log 2 b N rows with 2 b-1 entry each Neighborhood set M — node. Id’s, IP addresses of M closest nodes, useful to maintain locality properties Leaf set L — set of L nodes with closest node. Id INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Pastry Routing b=2, so node. Id is base 4 (16 bits) Contains the nodes that are numerically closest to local node Node. Id 10233102 Leaf set SMALLER LARGER 10233033 10233001 10233021 10233000 10233120 10233230 10233122 10233232 -2 -2301203 1 -2 -230203 -3 -1203203 1 -3 -021022 2 10 -3 -23302 102 -2 -2302 3 3 log 2 b N rows Routing table -0 -2212102 1 0 1 -1 -301233 10 -0 -31203 102 -0 -0230 10 -1 -32102 102 -1 -1302 1023 -0 -322 10233 -0 -01 1023 -1 -000 1 0 Contains the nodes that are closest to local node according to proximity metric 1023 -2 -121 10233 -2 -32 102331 -2 -0 2 Neighborhood set 13021022 02212102 INF 5071 – performance in distributed systems 10200230 22301203 Entries in the nth row share the first n-1 digits with current node common prefix – next digit – rest 2 b-1 entries per row 11301233 31203203 Entries in the mth column have m as nth row digit 31301233 3321 Entries with no suitable node. Id are left empty 2006 Carsten Griwodz & Pål Halvorsen

Pastry Routing 1. 2. source 3. 2331 X 0: 0 -130 | 1 -331 | 2 -331 | 3 -001 Search leaf set for exact match Search route table for entry with at one more digit common in the prefix Forward message to node with equally number of digits in prefix, but numerically closer in leaf set 1331 X 1: 1 -0 -30 | 1 -1 -23 | 1 -2 -11 | 1 -3 -31 1211 dest X 2: 12 -0 -1 | 12 -1 -1 | 12 -2 -3 | 12 -3 -3 1223 L: 1232 | 1221 | 1300 | 1301 INF 5071 – performance in distributed systems 1221 2006 Carsten Griwodz & Pål Halvorsen

Pastry Assessment § Scalability, fairness, load balancing Distributed index of arbitrary size Support for physical locality and locality by hash value Stochastically logarithmic search effort Network topology is partially accounted for, given an additional metric for physical locality § Stochastically logarithmic lookup in large systems, variable lookup costs § § § Content location § Search by hash key: limited ways to formulate queries § All indexed files are reachable § Not restricted to file location § Failure resilience § No single point of failure § Several possibilities for backup routes INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Examples: Tapestry

Tapestry § Approach taken § § Only concerned with self-organizing indexing Distributed index - decentralized lookup service Uses DHTs Content handling is an external problem entirely § No relation to content § No included replication or caching § P 2 P aspects § Every node must maintain keys § Adaptive to changes in membership and value change INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing and Location § Namespace (nodes and objects) § SHA-1 hash: 160 bits length § Each object has its own hierarchy rooted at Root. ID = hash(Object. ID) § Prefix-routing § § [JSAC 2004] Router at hth hop shares prefix of length ≥h digits with destination local tables at each node (neighbor maps) route digit by digit: 4*** 42 A* 42 AD neighbor links in levels INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing and Location § Suffix routing [tech report 2001] § Router at hth hop shares suffix of length ≥h digits with destination § Example: 5324 routes to 0629 via 5324 2349 1429 7629 0629 § Tapestry routing § § § Cache pointers to all copies Caches are soft-state UDP Heartbeat and TCP timeout to verify route availability Each node has 2 backup neighbors Failing primary neighbors are kept for some time (days) Multiple root nodes possible, identified via hash functions § Search value in a root if its hash is that of the root § Choosing a root node § Choose a random address § Route towards that address § If no route exists, choose deterministically, a surrogate INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Routing and Location § Object location § Root responsible for storing object’s location (but not the object) § Publish / search both routes incrementally to root § Locates objects § Object: key/value pair § E. g. filename/file § Automatic replication of keys § No automatic replication of values § Values § May be replicated § May be stored in erasure-coded fragments INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry #addr 1 #addr 2 … #K V Insert( , key K, value V) (#K, ●) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry V (#K, ●) caching (#K, ●) #addr 1 #addr 2 … #K result ? K (#K, ●) ● INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry V (#K, ●) (#K, ●) ● INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry V Move( , key K, value V) V (#K, ●) (#K, ●) ● INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry V (#K, ●) (#K, ●) Stays wrong till timeout (#K, ●) ● INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Tapestry Assessment § Scalability, fairness, load balancing § § Distributed index(es) of arbitrary size Limited physical locality of key access by caching and node. Id selection Variable lookup costs Independent of content scalability § Content location § Search by hash key: limited ways to formulate queries § All indexed files are reachable § Not restricted to file location § Failure resilience § § No single point of failure Several possibilities for backup routes Caching of key resolutions Use of hash values with several salt values INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Comparison

Comparison Chord Pastry Routing information Log(#nodes) routing table size Log(#nodes) x (2 b – 1) At least log(#nodes) routing table size Lookup cost Log(#nodes) lookup cost Approx. log(#nodes) lookup cost Variable lookup cost By neighbor list In mobile tapestry No single point of failure Several backup route Alternative hierarchies Physical locality Failure resilience No resilience in basic version Additional successor lists provide resilience INF 5071 – performance in distributed systems Tapestry 2006 Carsten Griwodz & Pål Halvorsen

Applications

Streaming in Peer-to-peer networks

Peer-to-peer network INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Promise

Promise § Video streaming in Peer-to-Peer systems § Video segmentation into many small segments § Pull operation § Pull from several sources at once § Based on Pastry and Collect. Cast § Adds rate/data assignment § Evaluates § Node capabilities § Overlay route capabilities § Uses topology inference § Detects shared path segments - using ICMP similar to traceroute § Tries to avoid shared path segments § Labels segments with quality (or goodness) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

[Ha feed Promise active sender a et Each active sender: • receives a control packet specifying which data segments, data rate, etc. , • pushes data to receiver as long as no new control packet is received . al. 03] standby active sender standby sender The receiver: • sends a lookup request using DHT • selects some active senders, control packet • receives data as long as no errors/changes occur • if a change/error is detected, new active senders may be Receiver selected Thus, Promise is a multiple sender to one receiver P 2 P media streaming system which 1) accounts for different capabilities, 2) matches senders to achieve best quality, and 3) dynamically adapts to network fluctuations and peer failure INF 5071 – performance in distributed systems active sender 2006 Carsten Griwodz & Pål Halvorsen

Split. Stream

Split. Stream § Video streaming in Peer-to-Peer systems § § Uses layered video Uses overlay multicast Push operation Build disjoint overlay multicast trees § Based on Pastry INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

[Ca stro Split. Stream Each node: • joins as many multicast trees as there are stripes (K) • may specify the number of stripes they are willing to act as Source: full quality movie router for, i. e. , according to the amount of resources available et. al. 0 3 ] Stripe 1 Each movie is split into K stripes and each stripe is multicasted using a separate three Thus, Split. Stream is a multiple sender to multiple receiver P 2 P system which distributes the forwarding load while respecting each node’s resource limitations, but some effort is required to build the forest of multicast threes Stripe 2 INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Some References 1. 2. 3. 4. 5. 6. M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh, "Split. Stream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October 2003 Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava, "Promise: Peer -to-Peer Media Streaming Using Collectcast", ACM MM’ 03, Berkeley, CA, November 2003 Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, ACM SIGCOMM’ 01 Ben Y. Zhao, John Kubiatowicz and Anthony Joseph, “Tapestry: An Infrastructure for Faulttolerant Wide-area Location and Routing”, UCB Technical Report CSD-01 -1141, 1996 John Kubiatowicz, “Extracting Guarantees from Chaos”, Comm. ACM, 46(2), February 2003 Antony Rowstron and Peter Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Middleware’ 01, November 2001 INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Adaptive Multi-Source Streaming in Heterogeneous Peer-to-Peer Networks Vikash Agarwal, Reza Rejaie Computer and Information Science Department University of Oregon http: //mirage. cs. uoregon. edu January 19, 2005 INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Introduction § P 2 P streaming becomes increasingly popular § Participating peers form an overlay to cooperatively stream content among themselves § Overlay-based approach is the only way to efficiently support multiparty streaming apps without multicast § Two components: § Overlay construction § Content delivery § Each peer desires to receive max. quality that can be streamed through its access link § Peers have asymmetric & heterogeneous BW connectivity Ø Each peer should receive content from multiple parent peers => Multi-source streaming. Ø Multi-parent overlay structure rather than tree INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Benefits of Multi-source Streaming § Higher bandwidth to each peer § higher delivered quality § Better load balancing among peers § Less congestion across the network § More robust to dynamics of peer participation Ø Multi-source streaming introduces new challenges … INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Multi-source streaming: Challenges § Congestion controlled connections from different parent peers exhibit § independent variations in BW § different RTT, BW, loss rate § Aggregate bandwidth changes over time Ø Streaming mechanism should be quality adaptive § Static “one-layer-per-sender” approach is inefficient § There must be a coordination mechanism among senders in order to § Efficiently utilize aggregate bandwidth § Gracefully adapt delivered quality with BW variations This paper presents a receiver-driven coordination mechanism for multi-source streaming called PALS INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Previous Studies § Congestion control was often ignored § Server/content placement for streaming MD content [Apostolopoulos et al. ] § Resource management for P 2 P streaming [Cue et al. ] § Multi-sender streaming [Nguyen et al], but they assumed § Aggregate BW is more than stream BW § RLM is receiver-driven but. . § RLM tightly couples coarse quality adaptation with CC § PALS only determines how aggregate BW is used § P 2 P content dist. mechanism can not accomodate “streaming” apps § e. g. Bit. Torrent, Bullet INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Overall Architecture § Overall architecture for P 2 P streaming § PRO: Bandwidth-aware overlay construction § Identifying good parents in the overlay § PALS: Multi-source adaptive streaming § Streaming content from selected parents § Distributed multimedia caching § Decoupling overlay construction from delivery provides great deal of flexibility § PALS is a generic multi-source streaming protocol for non-interactive applications INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Assumptions & Goals Assumptions: Goals: § All peers/flows are cong. § To fully utilize aggregate bandwidth controlled to dynamically maximize delivered § Content is layered encoded quality § All layers are CBR with the § Deliver max no of layers same cons. rate* § All senders have all layers (relax § Minimize variations in quality this later)* § Limited window of future packets are available at each sender § Live but non-interactive * Not requirements INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

P 2 P Adaptive Layered Streaming (PALS) § Receiver: periodically requests an ordered list of packets/segments from each sender. § Sender: simply delivers requested packets with the given order at the CC rate § Benefits of ordering the requested list: § Provide flexibility for the receiver to closely control delivered packets § Graceful degradation in quality when bandwidth suddenly drops § Periodic requests => stability & less overhead INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Basic Framework ØEWMA aggregate BW ØEstimate total no of pkts to be Peer 2 Receiver passively monitors EWMA BW from each sender Peer 1 delivered during next window (K) BW 1 Demux bw (t) 3 C C Decoder buf 3 bw 2(t) buf 2 buf 1 bw 1(t) 0 INF 5071 – performance in distributed systems buf 0 • Allocating each sender’s bw among active layers bw (t) • Controlling bw 0(t), bw 1(t), …, Assign a subset of pkts to each sender (Packet assignment) BW 0 Internet BW 2 Allocate K pkts among active layers (Quality Adaptation) • Controlling evolution of buf. state. Peer 0 2006 Carsten Griwodz & Pål Halvorsen

Key Components of PALS § Sliding Window (SW): to keep all senders busy & loosely synchronized with receiver playout time § Quality adaptation (QA): to determine quality of delivered stream, i. e. required packets for all layers during one window § Packet Assignment (PA): to properly distribute required packets among senders INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Sliding Window § § § 1) 2) 3) Buffering window: range of timestamps for packets that must be requested in one window. Window is slided forward in a step-like fashion Requested packets per window can be from Playing window (loss recovery) Buffering window (main group) Future windows (buffering) INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Sliding Window (cont’d) § Window size determines the tradeoff between smoothness or signaling overhead & responsiveness § Should be a function of RTT since it specifies timescale of variations in BW § Multiple of max smoothed RTT among senders § Receiver might receive duplicates § Re-requesting the packet that is in flight! § Ratio of duplicates are very low and can be reduced by increasing window INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Coping with BW variations § Sliding window is insufficient § Coping with sudden drop in BW by § Overwriting request at senders § Ordering requested packets § Coping with sudden increase in BW by § Requesting extra packets INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Quality Adaptation § Determining required packets from future windows § Coarse-grained adaptation § Add/drop layer § Fine-grained adaptation Peer 2 Peer 1 BW 1 Demux bw (t) 3 C C Decoder buf 3 bw 2(t) buf 2 buf 1 buf 0 bw 1(t) 0 bw (t) What is a proper buffer dist? INF 5071 – performance in distributed systems BW 0 Internet BW 2 § Controlling bw 0(t), bw 1(t), …, § Loosely controlling evolution of receiver buffer state/dist. Ø Buffer distribution determines what degree of BW variations can be smoothed. Peer 0 2006 Carsten Griwodz & Pål Halvorsen

Buffer Distribution § Impact on delivered quality § Conservative buf. distribution achieves long-term smoothing § Aggressive buf. distribution achieves short -term improvement § PALS leverages this tradeoff in a balanced fashion § Window size affects buffering: § Amount of future buffering § Slope of buffer distribution § Multiple opportunities to request a packet (see paper) § Implicit loss recovery INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Packet Assignment § How to assign an ordered list of selected pkts from diff. layers to individual senders? § Number of assigned pkts to each sender must be proportional to its BW contribution § More important pkts should be delivered Ø Weighted round robin pkt assignment strategy § Extended this strategy to support partially available content at each peer § Please see paper for further details INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Performance Evaluation § Using ns simulation to control BW dynamics § Focused on three key dynamics in P 2 P systems: BW variations, Peer participation, Content availability § Senders with heterogeneous RTT & BW § Decouple underlying CC mechanism from PALS § Performance Metrics: BW Utilization, Delivered Quality § Two strawman mechanisms with static layer assignment to each sender: § Single Layer per Sender (SLS): Sender i delivers layer i § Multiple Layer per Sender (MLS): Sender i delivers layer j<i INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Necessity of Coordination § SLS & MLS exhibit high variations in quality § No explicit loss recovery § No coordination § Inter-layer dependency magnifies the problem § PALS effectively utilizes aggregate BW & delivers stable quality in all cases INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Delay-Window Tradeoff § Avg. delivered quality only depends on agg. BW Heterogeneous senders § Higher Delay => smoother quality § Duplicates exponentially decrease with window size § Avg. per-layer buffering linearly increases with Delay § Increasing window leads to even buffer dist. § See paper for more results. INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Conclusion & Future Work § PALS is a receiver-driven coordination mechanism for streaming from multiple cong. controlled senders. § Simulation results are very promising § Future work: § Further simulation to examine further details § Prototype implementation for real experiments § Integration with other components of our architecture for P 2 P streaming INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Partially available content § Effect of segment size and redundancy INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen

Packet Dynamics INF 5071 – performance in distributed systems 2006 Carsten Griwodz & Pål Halvorsen