Simple Load Balancing for Distributed Hash tables Torsha

Simple Load Balancing for Distributed Hash tables -Torsha Banerjee Presentation for Internet & Web Algorithms

Presentation Outline o What is DHT (Distributed Hash Table)? o Why DHTs? o An example o Applications o Load Balancing o How lookup works? o Paper Idea o Performance Evaluation o Conclusion o References

What is DHT (Distributed Hash Table)? Distributed application put(key, data) get (key) Distributed hash table node …. data node § No central server §Partitions an ID space among n servers §Each distributed server has partial list of where data is stored in the system §Keys are mapped uniformly to data values §A "lookup" algorithm is required to locate data §put (key, data) and get (key, data) functions are used to handle data (Figure adopted from Frans Kaashoek)

What is DHT (Distributed Hash Table)? contd. . § Architectures CAN (Content Adressable Network) [1] v v v § v v keys hashed into d dimensional space operations performed are insert(key, value), retrieve(key) chooses the neighbor nearest to the destination peer for storage Chord [2] maps key to a peer (server) for storage maintains routing information as nodes join and leave the system

What is DHT (Distributed Hash Table)? contd. . Architectures contd. . § v v v v PASTRY [3] uses routing table, leaf set, neighborhood set, and file table routes messages to the node with the closest ID to the key low message overhead but higher latency and failure rate TAPESTRY [4] the first DHT routes messages to endpoints (peers) objects are known by name, not location creates a routing mesh of neighbors each peer stores a neighbor map complex and hard to maintain

Why DHTs? § § Provides incremental scalability of throughput and data capacity as more nodes are added to the cluster Robustness is achieved through data replication among multiple cluster nodes Building block for peer-to-peer applications Provides improved security and robustness

An Example 111 127 13 97 § § § 33§ 81 58 § § 24 § Circular number space 0 to 127 Routing rule is to move clockwise until current node ID key, and last hop peer (node) ID < key Example: key = 42 Key is routed to node 58 in the clockwise direction Node 58 “owns” keys in [34, 58] New entries always start with one known number Newcomer searches for “self” in the network n hash key = newcomer’s node ID n Search results in a node in the vicinity where newcomer needs to be

Applications § § § Any application that requires a hash table (database systems, symbol table in compliers) Storage, archival Web serving, caching Content distribution Query & indexing Naming systems Communication primitives Chat services Application-layer multi-casting Event notification services Publish/subscribe systems

Load Balancing § § Distributing processing and communications activity evenly across a computer network Single point failure is avoided Important for networks where it's difficult to predict the number of requests that will be issued to a server Two components of load balancing v Consistent hashing over a 1 -D space v Some kind of indexing topology is used to navigate the space

Load Balancing contd. . § § Consistent hashing: Ø Both keys and peers are hashed on a 1 -D ring (e. g. Chord ring) Ø Keys are assigned to the nearest peer in the clockwise direction Ø Existence of overlay edges for faster search along the ring Ø Load imbalance may occur with increasing arc length Ø Maximum arc length is Chord [2] solves this problem to some existent by introducing virtual peers

How Lookup works? § § § In an m-bit identifier space, there are 2 m identifiers Identifiers are ordered on an identifier circle modulo 2 m The identifier ring is called Chord ring Key k is assigned to the first node whose identifier is equal to or follows k in the identifier space Each node maintains a routing table with (at most) m entries Finger table for node n is calculated from the adjacent table

How Lookup works? contd. . 0 Example: Chord [2] Finger Table for Node 2 start interval succ. 3 [3, 4) 5 4 [4, 6) 5 6 [6, 10) 7 10 [10, 2) 10 15 1 2 14 3 12 5 10 7 9 8 6 successor(6) = 7

Paper Idea § Chord cannot totally solve the load balancing issue v One peer may be responsible for items Number of edges per peer is in the worst case incurring higher messaging cost and storage space Application of “power of two choices” [5, 6] proposed in the paper [7] v Each peer represents a point in the circle (no finger table) v Peers arranged sequentially v d>=2 peers are chosen by an item for storage and the one with the least load is selected finally v §

Paper Idea contd. . 0. 46 14 0 1 15 2 Item x - § 3 § 12 5 10 0. 51 9 7 8 6 0. 6 § Peer 1 is selected for storage of item x The maximum load of any peer is Four different hash functions are used to select for different peers

Paper Idea contd. . § § If any two chosen peers have same load tie needs to be broken v Arbitrarily v Vocking’s scheme Ø Always-go-left scheme Ø Bins divided into d groups of size Ø Groups are numbered from 1 to d Ø If there are several locations with same load, the ball is placed in the location of the leftmost group v Choosing the least loaded arc of smallest length gives better results than Vocking’s scheme

Paper Idea contd. . § § § Algorithm description h 0 is used to map peers onto the ring Hash functions h 0(x), h 1(x), …, hd(x) are first calculated by a peer to insert item x Peers p 1, …, pd are looked up in parallel corresponding to the hash functions h 0(x), h 1(x), …, hd(x) pi with lowest load is selected for storing x For searching an already inserted item x, again the d peers are looked up and the one storing the key-value pair is returned

Paper Idea contd. . § § Disadvantage is the increase in network traffic by an order of d v solved by the redirection pointers v All peers pj store a redirection pointer where pi is the least loaded peer v For searching x, hj(x) is used to find pj v If pj does not have x, it returns the redirection pointer v With uniform selection of hj, this extra step is required with probability A particular peer needs to be active all the time v [8] solves this problem to some extent

Paper Idea contd. . § § Static placement of items leads to poor performance Can be solve by letting item x change location when reinserted if pi, storing x previously, has become heavily loaded now Schemes like load stealing and load-shedding are used Load stealing- If pi is an under utilized peer it takes away load from other peers with heavier loads 0 15 Item x 1 2 0. 9 3 14 Stealing of data 12 Redirection pointer 5 0. 02 Copy of Item x 10 7 9 8 6

Paper Idea contd. . Load shedding- If pi is overloaded, it offloads data to lighter peers Out of peers, p 1, …, pd, x can be located in k least loaded peers thus replicating data Enables parallel downloading from multiple sources § § § 0 15 Item x 1 2 0. 9 3 14 Redirection pointer 12 Offloading of data 5 0. 02 Copy of Item x 10 7 9 8 6

Performance Evaluation § The three schemes compared with Chord [2] are v virtual peers v An unbounded number of virtual peers v The proposed power of two scheme Simulation parameters Number of peers Number of items 1 st percentile and 99 th percentile load for Chord

Performance Evaluation contd. .

Performance Evaluation contd. . § § § Maximum load is key metric v The highest loaded peers have greater probability of failing v Cascading effect occurs leading to poor performance by neighboring peers Performance of virtual peers is similar to that of Chord v Maximum load is high of the order of 350 v Increase in topology maintenance traffic due to virtual peers Load balancing is best in the unlimited case v Less variation in load compared to the “power of two” choice case v Maximum load is similar to “power of two” choice case of the order of 150

Performance Evaluation contd. . Unlimited resource scenario is unrealistic and expensive The “power of two” choice case exhibits greater variation in load compared to the unlimited resource case v Maximum load is similar to the unlimited resource case v Better load balancing than virtual peer case v Routing information shared is less compared to Chord v §

Conclusion § The “power of two” choice case provides multiple storage options compared to Chord v Increased fault tolerance v Better performance in highly dynamic environments v Trade off is Ø small increase in amount of static storage at each peer Ø Small additive constant in search length

References § § § [1] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker, “A Scalable Content. Addressable Network, ” Proceedings of ACM SIGCOMM 2001. [2] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, “Chord: A Scalable Peer-to -Peer Lookup Service for Internet Applications, ” Proceedings of the 2001 ACM SIGCOMM. [3] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems, “ IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001.

References contd. . § § [4] Ben Y. Zhao, John Kubiatowicz, Anthony D. Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, ” SIGCOMM 2001. [5] Azar, Y. , Broder, A. , Karlin, A. , and Upfal, E. , “Balanced allocations. SIAM Journal on Computing 29, 1 (1999), 180 -200. [6] Mitzenmacher, M. , Richa, A. , and Sitaraman, R. , “The Power of Two Choices: A Survey of Techniques and Results”, Kluwer Academic Publishers. [7] John Byers, Jeffrey Considine and Mitzenmacher, “Simple Load Balancing for Distributed Hash Table”,