A Scalable Content Addressable Network Sylvia Ratnasamy Paul
A Scalable Content. Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims
Introduction Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems Content-Addressable Networks (CAN) are presented as a scalable, fault-tolerant and completely selforganizing peer-to-peer overlay network Indexing is accomplished with Distributed Hash Table mapping keys to values
Design �Multi-dimensional coordinate space with d dimensions (d-torus) �Each node owns a zone in the space �zone is a section of the hash table �So each node stores a section of the table
Distributed Hash Table �Uniform hash function is used to map key K to point P �Creates table of key value pairs (K, V) �For any point P, the corresponding (K, V) stored at node N that owns the zone that contains point P �Entries are retrieved by using same hash function to map K to P and retrieve entry from node that owns the zone containing P
Routing �Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes �This data makes up the node’s routing table �Greedy algorithm if P is within the Zone of current node, return (K, V) else forward the query to the neighbor with coordinates closest to P
More Routing �Draw a straight line from point in local zone to P �Follow straight line via neighbors �For d-dimensional space, each node maintains 2 d neighbors �Nodes are self-organizing, making decisions dynamically
Node Joining the CAN �New node N 1 attempts to locate node N 2 already in the CAN, typically using the IP address of a bootstrap node �Generate random point P in the space �Use hash function to locate zone that contains P �Send JOIN message to node N 3 that owns zone that contains P �N 3 splits zone in half, assigns half to N 1 by sending half of (K, V) pairs to N 1, along with neighbor information �N 3 informs neighbors of space reallocation
Node departure �Explicit departure – assigns zone and (K, V) pairs to a neighbor node to produce a single zone �Attempt to combine with a neighboring node to form a valid zone, else two zones are temporarily handled by smallest neighbor
Failures �Each node sends periodic update messages to each of its neighbors �Crashed nodes are detected by neighbors by a lack of periodic update messages �Neighbor nodes start takeover timer �Send a takeover message to all of failed node’s neighbors �Neighboring nodes agree on node with smallest volume �Smallest node takes over crashed node’s zone
Design Improvements �Multiple dimensions �Multiple realities �Multiple Hash functions �Overload the coordinate zones �Round trip time (RTT) Ratio �Topologically-sensitive construction (landmarking) �Uniform Partitioning
Multiple Dimensions �Increase number of dimensions �Reduce average path length �Reduce path latency �Increases routing table size due to greater number of neighbors
Multiple Realities �Increase number of Realities �Multiple coordinate spaces exist at the same time, each space is called a reality �Each node assigned a different node in each reality �Shorter paths, higher fault-tolerance �(K, V) mapping to P at (x, y, z) is possibly stored at three different nodes
Dimensions v. Realities �Two improvements with greatest impact �Dimensions have a larger effect on reducing path length �Realities provide stronger faulttolerance and data availability
Multiple Hash Functions �Multiple hash functions increases data availability, reduces query latency �Improve data availability by mapping a single key to k points in the coordinate space by using k hash functions �(K, V) only unavailable when all nodes crash �Parallel querying of k nodes with k hash functions can reduce lookup latency
Overload Coordinate Zones �Overload the coordinate zones by assigning more than one node to share the same zone �Reduces the average path length, improved faulttolerance �No additional neighbors
RTT Ratio �Limiting the round-trip-time (RTT) �Each node measures RTT to neighbors �Favor the lower latency paths
Topologically Sensitive Construction �Use physical landmarks for construction �Each node measures RTT of each landmark
Uniform Partitioning �A form of volume balancing �When a JOIN is received by a node, it also checks its neighbor nodes when deciding to accept JOIN �Largest neighbor accepts and splits �Achieves a load balance amongst the nodes
Design Review �Ran two simulations using 218 nodes �“bare bones” CAN without improvements �“knobs-on-full” CAN using all features except landmarks and multiple hashes �Biggest gain from number of dimensions (path length 198 to 5)
Questions?
- Slides: 20