Symphony Distributed Hashing in a Small World Gurmeet

Symphony: Distributed Hashing in a Small World Gurmeet Singh Manku Mayank Bawa Prabhakar Raghavan Presented by Satpreet Singh

Motivation GOAL: To maintain a large DHT over a WAN DESIRED CHARACTERISTICS: • Scalability (work for a range of network sizes) • Stability (handle churn) • Performance (provide low-latency lookups & low maintenance costs with churn) • Flexibility (provide design knobs, preferably run-time) • Simplicity (easy to understand, code, deploy…) SOLUTION: Symphony (fuses ideas from Chord & Klienberg’s Small World greedy routing algorithm/result)

Features/Advantages of Symphony 1. Low state maintenance (conseq. of low degree) • Fewer pings/keep-alives, less (ambient) control traffic • Distributed locking and coordination overhead over smaller sets of nodes • Smaller bootstrapping time when a node joins • Smaller recovery time when a node leaves 2. Smooth out-degree vs latency tradeoff • Only protocol that offers this tuning knob even at run time! • Out-degree not fixed at runtime, or as function of network size. 3. Flexibility and support for heterogeneity • Different nodes can have different number of links 4. Fault tolerance • Only short links are bolstered. No backups for long links

Architecture: Overview • Establish keyspace as [0, 1) (wrap around a ring like in Chord) • Every node ‘manages’ subrange from own-id. to next-clockwisenode’s-id. (~ equi-sized) • Objects hash to m-bit hash-key K, managed by node that manages real number containing K/2 m node long link short link A typical Symphony network • 2 Short-links: one with each immediate neighbor

Architecture: Overview • k (≥ 1) Long Links (uni-/bi-direct. ) - draws a rand. number (x) from a Probability Distribution Function - contacts manager of (x) using a Routing Protocol - Establishes a link (if incoming links at manager ≤ 2 k, if not resample PDF) PDF is a type of harmonic distribution (so, Symphony) Pn(x) = 1/(x ln [n]) in x Є [1/n, 1] =0 otherwise node long link short link A typical Symphony network PDF estimates [n] using an Estimation Protocol

Network Size Estimation Protocol Goal: to estimate ‘n’ - the current total number of nodes in the DHT So if, - S is any set of s distinct nodes, -Xs is the sum of segment-lengths managed by them, Estimated n = s/Xs x = Length of arc 1/x = Estimate of n (Idea from Viceroy) • All s nodes update their estimate, • Experimentally s = 3 found good enough, • So simply use node and it’s two immediate neighbors Fact: Impact of increasing s on avg. latency is insignificant

Routing Protocol(s) Klienberg’s Small World result: A message can be routed to any node by greedy routing in O(log 2 n) hops, in a construction where each node has one link to each of it’s 4 directional neighbors and a single long-distance link to a node chosen from a suitable PDF. To lookup hash key x Є [0, 1), contact the manager of x : Unidirectional Routing Protocol: Node forwards a lookup for x along (short or long) link that minimizes the clockwise distance to x Bidirectional Routing Protocol: Node forwards a lookup for x along (short or long) link that minimizes the absolute distance to x In both cases, expected path length in an n-node network with k = O(1) links is O(1/k log 2 n) hops. Bidirectional & 1 -Lookahead reduce latency by 40% and 30% each

Join/Leave Protocols JOIN: • The new node chooses its id x from [0, 1) uniformly at random • Using the routing protocol it identifies the current manager of x • It then runs the estimation protocol using s = 3 • X then uses Pn to establish its long distance links x Cost = k links * O(1/k log 2 n) msgs. = O(log 2 n) messages LEAVE: • All out- and in- links to x’s long distance neighbors are snapped • Other nodes whose outgoing links to x are just broken, reinstate those links with other nodes • The immediate neighbors of x establish short-links between themselves • Successor of x initiates estimation protocol over s = 3 neighbors Cost = O(log 2 n) messages

Re-linking Protocols etc… RE-LINKING: • nx = x’s current estimate of n • nxlink = x’s estimate when long distance links were last established • When nx and nxlink differ → stale estimate • Re-link only when nx / nxlink is not in the range [0. 5, 2] • Re-linking gains are marginal, cost high: O(log 2 n) messages LOOKAHEAD: • Node can maintain list of neighbor’s neighbors • Improves choice of neighbor for routing queries • No extra messages – piggyback on keep-alives of TCP link • Cost = O(k 2) space. Number of long-links remains unchanged FAULT TOLERANCE: • Deletion of short links more detrimental as leads to node isolation • Make f copies of node’s content in f next clockwise nodes

Experimental Data: SETUP: • Large DHT: 25 to 215 nodes simulated in network • Four kinds of test networks: Static, Expanding-Relink & Dynamic Estimate Protocol Performance: estimate improves for log(n) neighbors, but impact on avg. latency is minimal (later)

Experimental Data: Routing Protocol Performance: Increasing links beyond 2 has marginal benefits. Bidirectional routing is good (30% reduction in latency)

Experimental Data: Lookahead Performance: • 1 -Lookahead reduces avg. latency by 40% for small value of k. • Also, it does not entail an increase in the no. of long-links per node. • Neighbor-lists are exchanged periodically piggy-backed on normal routing traffic or keep-alives

Experimental Data: Fault-tolerance motivation: • [Left] On deleting a random set of links (short + long), successful lookups drops quite quickly [deletion of short links causes node isolation quickly] • [Right] Impact of removing only long links not as severe (and only avg. latency goes up) • Thus, only fortify short links. Make f copies of content in clockwise direction.

Comparisons & Conclusions: • For large DHTs, (25 to 215 nodes) Symphony outperforms others • Avg. TCP links = 10; Latency is about 8 hops; • Lower costs of Join/Leave compared to Chord etc. • Number of neighbors not fixed at outset; No backup links

Wrapping up… Symphony. . . • is a simple protocol for managing large DHTs • supports a dynamic network of hosts with relatively short lifetimes • scales well • has low lookup latency • has low maintenance cost • requires few neighbors per node • supports heterogenity in nodes (run-time knobs) • provides flexibility in design Questions… ?