PeerToPeer Systems Chapter 10 B Ramamurthy Different Types

Peer-To-Peer Systems Chapter 10 B. Ramamurthy

Different Types of Systems • • Monolithic application Simple client-server Multi-tier client-server Request-response – Pull /Push mode – Tightly/loosely coupled • • Centralized and distributed systems Master-slave systems (Hadoop and HDFS) Peer-to-peer systems (Bittorrent) The concept of overlays Page 2 B. Ramamurthy 9/6/2021

Introduction • • • Peer-to-peer systems represents a paradigm for the construction of distributed systems and applications in which data and computational resources are contributed by many hosts on the Internet. Precipitated by explosive growth of networking and need for sharing resources. Peer to peer systems are emerging that have the capacity share computing resources, storage, data (audio, video, genetic) present at the edges of the internet on a global scale. They are very effective when used to store very large collection of immutable data. Typically does not work for data that is frequently updated. Low overhead. Page 3 B. Ramamurthy 9/6/2021

Important Characteristics • Their design ensures that each user contributes • • resources to the system. The nodes may not contributing resources at the same level. All nodes in a peer-to-peer system have the same functional capabilities and responsibilities. Their correct operation does nor depend on the existence of any centrally administered systems. They can be designed to provide limited anonymity to the providers and users of the resources. Page 4 B. Ramamurthy 9/6/2021

Key Issues • Algorithms for placement of data objects across many hosts. • Access to them in a manner that balances the workload and ensures availability without adding undue overheads. • Use existing naming, routing, data replication and security techniques. • Build a reliable resource-sharing layer over an unreliable and untrusted collection of computers and networks. Page 5 B. Ramamurthy 9/6/2021

Four Generations of P 2 P • First generation is Napster music exchange server • Second generation: file sharing with greater scalability, anonymity, and fault tolerance – Freenet, Gnutella, Kazaa, Bit. Torrent • Third generation characterized by emergence of middleware for application independence – Pastry, Tapestry, … • Fourth Generation – IPFS – inter-planetary file system Page 6 B. Ramamurthy 9/6/2021

Different types of Routing • IP vs Overlay routing Page 7 B. Ramamurthy 9/6/2021

IP vs. overlay routing for peer-topeer applications IP Application-level routing overlay 32 Scale Load balancing Network dynamics (addition/deletion of objects/nodes) Fault tolerance Target identification Security andanonymity Page 8 IPv 4 is limited to 2 addressablenodes. The IPv 6 name space is much moregenerous 128 (2 ), but addresses in both versions are hierarchically structured and much of the space is pre-allocated according to administrative requirements. Loads on routers are determin ed by network topology and associated traffic patterns. Peer-to-peer systems can addressmore objects. The GUID name space is very large and flat 128 (>2 ), allowing it to be much more fully occupied. B. Ramamurthy 9/6/2021 Object locations can be ra ndomized and hence traffic patterns are divorced from the network topology. IP routingtables are updated asy nchronously on Routing tables can be pudated synchronously or a best-efforts basis with time constants on the asynchronously with fractionsof a second order of 1 hour. delays. Redundancy is designed into the IP network by Routes and object refer ences can be replicated n-fold, ensuring tolerance of n failures ofnodes its managers, ensuring toleran ce of a single router or network connectivityfailure. n-fold or connections. replication is costly. Each IP address maps to exactly one target Messages can be routed to the nearest replica of node. a target object. Addressing is only secu re when all nodes are Security can be achiev ed even in environments trusted. Anonymity for the owners of addresses with limited trust. A limited degree of is not achievable. anonymity can be provided.

Curious to know how a GUID is generated? Determine the values for the UTC-based timestamp and clock sequence to be used in the UUID • For the purposes of this algorithm, consider the timestamp to be a 60 -bit unsigned integer and the clock sequence to be a 14 -bit unsigned integer. Sequentially number the bits in a field, starting with zero for the least significant bit. • Set the time_low field equal to the least significant 32 bits (bits zero through 31) of the timestamp in the same order of significance. • Set the time_mid field equal to bits 32 through 47 from the timestamp in the same order of significance. • Set the 12 least significant bits (bits zero through 11) of the time_hi_and_version field equal to bits 48 through 59 from the timestamp in the same order of significance. • Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4 -bit version number corresponding to the UUID version being created, as shown in the table above. • Set the clock_seq_low field to the eight least significant bits (bits zero through 7) of the clock sequence in the same order of significance. Page 9 B. Ramamurthy 9/6/2021

More on GUID • Are usually secure hash of attributes of the resource: so you don’t expect the state of the resource to change. • Hash of global clock + resource state + ip address + http: //www. famkruithof. net/uuidgen • Remember N-fold replication possible, thus many replications of an object of a given GUID may be present. Page 10 B. Ramamurthy 9/6/2021

Overlay Routing • It is different than IP routing however is strongly influenced by IP routing. • Routing tables may be updated synchronously or asynchronously. • N-fold replication affects the routing. Page 11 B. Ramamurthy 9/6/2021

Napster: peer-to-peer file sharing with a centralized, replicated index Page 12 B. Ramamurthy 9/6/2021

Napster (contd. ) • Napster demonstrated the feasibility of • • building a useful large-scale service which depends wholly on individual computers. Music files are not updated; state of resource stable No guarantees about availability; it fit well for music files Issue 1: one central server for index Issue (? ) 2: Anonymity of the users Page 13 B. Ramamurthy 9/6/2021

Issue 3: Copyright Issues • The developers of Napster argued that they are liable for the infringement of the copyrights of the owners because they were not participating in the copying process, which was done entirely between users machines. • Their argument failed because the index servers were centralized and were an essential part of the process. • Since index servers were located at well-known addresses, their operations were unable to remain anonymous and so could be targeted in law suits. Page 14 B. Ramamurthy 9/6/2021

Lessons Learned from Napster • Naspter demonstrated the feasibility of building a useful large-scale service that depends wholly on data and computing resources owned by ordinary users. • Naspter used locality to avoid swamping of data transfers. • It took advantage of the special characteristics of the application for which it was designed in other ways: • Music files are not updated: many replicas possible • No guarantees required about availability of individual files. Page 15 B. Ramamurthy 9/6/2021

Freenet and Gnutella • Napster maintained a unified index of available files (resources) • Gnutella and Freenet used partitioned and distributed indexes and algorithms specific to each system • “file location” problem: network file system (NSF) like systems requires substantial configuration. Page 16 B. Ramamurthy 9/6/2021

Peer-to-Peer Middleware • Functional requirements: – Enable clients to locate and communicate with any individual resource – Add and remove nodes – Add and remove resources – Simple API • Non-functional requirements: global scalability, load balancing, accommodating to highly dynamic host availability, trust, anonymity, deniability… Page 17 B. Ramamurthy 9/6/2021

Routing Overlay • Routing overlay locates nodes and objects. It is middleware layer responsible for routing requests from clients to hosts that holds the object to which request is addressed. • Main difference is that routing is implemented is in the application layer (besides the IP routing at network layer) Page 18 B. Ramamurthy 9/6/2021

Distribution of information in a routing overlay A’s routing knowledge D’s routing knowledge C A D B Object: Node: Page 19 B’s routing knowledge B. Ramamurthy C’s routing knowledge 9/6/2021

Basic programming interface for a distributed hash table (DHT) put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it. Page 20 B. Ramamurthy 9/6/2021

Basic programming interface for distributed object location and routing publish(GUID ) GUID can be computed from the object (or some part of it, e. g. its name). This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. send. To. Obj(msg, GUID, [n]) Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the object’s state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object. Page 21 B. Ramamurthy 9/6/2021

Overlay case studies: Pastry & Tapestry Routing • Prefix routing based on GUIDs • Narrow search for next node along the route by applying a binary mask that selects an increasing number of hexadecimal digits from the destination GUID after each hop. Page 22 B. Ramamurthy 9/6/2021

Figure 10. 7: First four rows of a Pastry routing table Page 23 B. Ramamurthy 9/6/2021
![Figure 10. 8: Pastry routing example Based on Rowstron and Druschel [2001] Page 24 Figure 10. 8: Pastry routing example Based on Rowstron and Druschel [2001] Page 24](http://slidetodoc.com/presentation_image_h2/8cfc2244d3fa5c656957f66dc983eec3/image-24.jpg)
Figure 10. 8: Pastry routing example Based on Rowstron and Druschel [2001] Page 24 B. Ramamurthy 9/6/2021

Figure 10. 9: Pastry’s routing algorithm • To handle a message M addressed to Node D (where R[p, i] is the element at column i of row p of the routing table): 1. If (L-l < D< Ll ) { // the destination is within the leaf set of or is the current node. forward M to the element Li of the leaf set with GUID closest to D or current node A. } 2. Else { // use routing table to dispatch M to node with closer GUID 2 a. Find p, the length of the longest common prefix of D and A. and i, the (p+1)th hexadecimal digit of D. 2 b. If (R[p, i] ≠ null) forward M to R[p, i] // route it to node with longest common prefix 2 c. Else { // there is no entry in the routing table 3 a. Forward M to a node in L or R with common prefix length i, but GUID that is numerically closer. } } Page 25 B. Ramamurthy 9/6/2021
![Figure 10. 10: Tapestry routing From [Zhao et al. 2004] Page 26 B. Ramamurthy Figure 10. 10: Tapestry routing From [Zhao et al. 2004] Page 26 B. Ramamurthy](http://slidetodoc.com/presentation_image_h2/8cfc2244d3fa5c656957f66dc983eec3/image-26.jpg)
Figure 10. 10: Tapestry routing From [Zhao et al. 2004] Page 26 B. Ramamurthy 9/6/2021

Special Features of Tapestry • Implements DHT and routes messages based on GUIDs associated with resources using prefix routing in a manner similar to Pastry. • 160 bit identifiers (uniform for resources as well as nodes) • Periodic “publish” call caches the mapping in the routing table Page 27 B. Ramamurthy 9/6/2021

Summary • • • Napster: Central index server: request location, get location, download, update index (why? ); need was for sharing music files (immutable); study Napster vs. copyright ownership legal issues (p. 403); index servers are not distributed but replicated. Music files identified by GUID a secure hash code generated out of the features of the file. Next generation: Gnutella and Freenet: Distributed index; designed to meet the need for automatic placement and subsequent location of distributed objects; middleware providing API to the services; Distributed algorithm called routing overlay takes responsibility of locating nodes and objects; Pastry routing algorithm O(log 16 N): 10. 9 and the related explanation. Host integration, host failure, locality (/redundancy), fault tolerance, dependability. Circular and linear address space. Compare this with IP address space? Page 28 B. Ramamurthy 9/6/2021

References • http: //web. cs. ucla. edu/classes/cs 217/05 Bit. Torrent. pdf Page 29 B. Ramamurthy 9/6/2021
- Slides: 29