EE 122 Lecture 23 PeertoPeer Networks Ion Stoica

  • Slides: 42
Download presentation
EE 122: Lecture 23 (Peer-to-Peer Networks) Ion Stoica November 29, 2001 istoica@cs. berkeley. edu

EE 122: Lecture 23 (Peer-to-Peer Networks) Ion Stoica November 29, 2001 istoica@cs. berkeley. edu

How Did it Start? § A killer application: Naptser - Free music over the

How Did it Start? § A killer application: Naptser - Free music over the Internet § Key idea: share the storage and bandwidth of individual (home) users Internet istoica@cs. berkeley. edu 2

Model § § Each user stores a subset of files Each user has access

Model § § Each user stores a subset of files Each user has access (can download) files from all users in the system istoica@cs. berkeley. edu 3

Main Challenge § Find where a particular file is stored E F D E?

Main Challenge § Find where a particular file is stored E F D E? A C B istoica@cs. berkeley. edu 4

Other Challenges § § Scale: up to hundred of thousands or millions of machines

Other Challenges § § Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time istoica@cs. berkeley. edu 5

Napster § § Assume a centralized index system that maps files (songs) to machines

Napster § § Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) - Query the index system return a machine that stores the required file • Ideally this is the closest/least-loaded machine - ftp the file § Advantages: - Simplicity, easy to implement sophisticated search engines on top of the index system § Disadvantages: - Robustness, scalability (? ) istoica@cs. berkeley. edu 6

Napster: Example m 5 E m 6 F E? E E? m 5 m

Napster: Example m 5 E m 6 F E? E E? m 5 m 1 m 2 m 3 m 4 m 5 m 6 m 4 C A m 1 D A B C D E F B m 3 m 2 istoica@cs. berkeley. edu 7

Gnutella § § § Distribute file location Idea: multicast the request Hot to find

Gnutella § § § Distribute file location Idea: multicast the request Hot to find a file: - Send request to all neighbors - Neighbors recursively multicast the request - Eventually a machine that has the file receives the request, and it sends back the answer § Advantages: - Totally decentralized, highly robust § Disadvantages: - Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) istoica@cs. berkeley. edu 8

Gnutella: Example § Assume: m 1’s neighbors are m 2 and m 3; m

Gnutella: Example § Assume: m 1’s neighbors are m 2 and m 3; m 3’s neighbors are m 4 and m 5; … m 5 E m 6 F E E? D E? m 4 E? E? C A m 1 B m 3 m 2 istoica@cs. berkeley. edu 9

Fast. Track § § § Use the concept of suppernode A combination between Napster

Fast. Track § § § Use the concept of suppernode A combination between Napster and Gnutella When a user joins the network it joins a suppernode A suppernode acts like Napster server for all users connected to it Queries are brodcasted amongst suppernodes (like Gnutella) istoica@cs. berkeley. edu 10

Freenet § Addition goals to file location: - Provide publisher anonymity, security - Resistant

Freenet § Addition goals to file location: - Provide publisher anonymity, security - Resistant to attacks – a third party shouldn’t be able to deny the access to a particular file (data item, object), even if it compromises a large fraction of machines § Architecture: - Each file is identified by a unique identifier - Each machine stores a set of files, and maintains a “routing table” to route the individual requests istoica@cs. berkeley. edu 11

Data Structure § Each node maintains a common stack § Forwarding: istoica@cs. berkeley. edu

Data Structure § Each node maintains a common stack § Forwarding: istoica@cs. berkeley. edu file … - Each message contains the file id it is referring to - If file id stored locally, then stop; - If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop id next_hop … - id – file identifier - next_hop – another node that store the file id - file – file identified by id being stored on the local node 12

Query § § API: file = query(id); Upon receiving a query for document id

Query § § API: file = query(id); Upon receiving a query for document id - Check whether the queried file is stored locally • If yes, return it • If not, forward the query message § Notes: - Each query is associated a TTL that is decremented each time the query message is forwarded; to obscure distance to originator: • TTL can be initiated to a random value within some bounds • When TTL=1, the query is forwarded with a finite probability - Each node maintains the state for all outstanding queries that have traversed it help to avoid cycles - When file is returned it is cached along the reverse path istoica@cs. berkeley. edu 13

Query Example query(10) n 2 n 1 4 n 1 f 4 12 n

Query Example query(10) n 2 n 1 4 n 1 f 4 12 n 2 f 12 5 n 3 1 9 n 3 f 9 4’ 4 n 4 3 14 n 5 f 14 13 n 2 f 13 3 n 6 2 n 3 3 n 1 f 3 14 n 4 f 14 5 n 3 § n 5 5 4 n 1 f 4 10 n 5 f 10 8 n 6 Note: doesn’t show file caching on the reverse path istoica@cs. berkeley. edu 14

Insert § § API: insert(id, file); Two steps - Search for the file to

Insert § § API: insert(id, file); Two steps - Search for the file to be inserted • If found, report collision • if number of nodes exhausted report failure - If not found, insert the file istoica@cs. berkeley. edu 15

Insert § § Searching: like query, but nodes maintain state after a collision is

Insert § § Searching: like query, but nodes maintain state after a collision is detected and the reply is sent back to the originator Insertion - Follow the forward path; insert the file at all nodes along the path - A node probabilistically replace the originator with itself; obscure the true originator istoica@cs. berkeley. edu 16

Insert Example § Assume query returned failure along “gray” path; insert f 10 insert(10,

Insert Example § Assume query returned failure along “gray” path; insert f 10 insert(10, f 10) n 1 4 n 1 f 4 12 n 2 f 12 5 n 3 n 2 9 n 3 f 9 n 3 3 n 1 f 3 14 n 4 f 14 5 n 3 n 4 n 5 14 n 5 f 14 13 n 2 f 13 3 n 6 4 n 1 f 4 11 n 5 f 11 8 n 6 istoica@cs. berkeley. edu 17

Insert Example insert(10, f 10) n 1 10 n 1 f 10 4 n

Insert Example insert(10, f 10) n 1 10 n 1 f 10 4 n 1 f 4 12 n 2 orig=n 1 n 2 9 n 3 f 9 n 3 3 n 1 f 3 14 n 4 f 14 5 n 3 n 4 n 5 14 n 5 f 14 13 n 2 f 13 3 n 6 4 n 1 f 4 11 n 5 f 11 8 n 6 istoica@cs. berkeley. edu 18

Insert Example § n 2 replaces the originator (n 1) with itself insert(10, f

Insert Example § n 2 replaces the originator (n 1) with itself insert(10, f 10) n 1 10 n 1 f 10 4 n 1 f 4 12 n 2 10 n 1 f 10 9 n 3 f 9 orig=n 2 n 3 10 n 2 10 3 n 1 f 3 14 n 4 n 5 14 n 5 f 14 13 n 2 f 13 3 n 6 4 n 1 f 4 11 n 5 f 11 8 n 6 istoica@cs. berkeley. edu 19

Insert Example § n 2 replaces the originator (n 1) with itself Insert(10, f

Insert Example § n 2 replaces the originator (n 1) with itself Insert(10, f 10) n 1 10 n 1 f 10 4 n 1 f 4 12 n 2 10 n 1 f 10 9 n 3 f 9 n 3 10 n 2 10 3 n 1 f 3 14 n 4 n 5 10 n 2 f 10 14 n 5 f 14 13 n 2 10 n 4 f 10 4 n 1 f 4 11 n 5 istoica@cs. berkeley. edu 20

Freenet Properties § § § Newly queried/inserted files are stored on nodes with similar

Freenet Properties § § § Newly queried/inserted files are stored on nodes with similar ids New nodes can announce themselves by inserting files Attempts to supplant or discover existing files will just spread the files istoica@cs. berkeley. edu 21

Freenet Summary § Advantages - Provides publisher anonymity - Totally decentralize architecture robust and

Freenet Summary § Advantages - Provides publisher anonymity - Totally decentralize architecture robust and scalable - Resistant against malicious file deletion § Disadvantages - Does not always guarantee that a file is found, even if the file is in the network istoica@cs. berkeley. edu 22

Other Solutions to the Location Problem § § Goal: make sure that an item

Other Solutions to the Location Problem § § Goal: make sure that an item (file) identified is always found Abstraction: a distributed hash-table data structure - insert(id, item); - item = query(id); - Note: item can be anything: a data object, document, file, pointer to a file… § Proposals - CAN (ACIRI/Berkeley) Chord (MIT/Berkeley) Pastry (Rice) Tapestry (Berkeley) istoica@cs. berkeley. edu 23

Content Addressable Network (CAN) § § Associate to each node and item a unique

Content Addressable Network (CAN) § § Associate to each node and item a unique id in an d-dimensional space Properties - Routing table size O(d) - Guarantee that a file is found in at most d*n 1/d steps, where n is the total number of nodes istoica@cs. berkeley. edu 24

CAN Example: Two Dimensional Space § § Space divided between nodes All nodes cover

CAN Example: Two Dimensional Space § § Space divided between nodes All nodes cover the entire space 7 Each node covers either a square or a rectangular area of ratios 1: 2 or 2: 1 6 Example: - Assume space size (8 x 8) - Node n 1: (1, 2) first node that joins cover the entire space 5 4 3 n 1 2 1 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 25 7

CAN Example: Two Dimensional Space § Node n 2: (4, 2) joins space is

CAN Example: Two Dimensional Space § Node n 2: (4, 2) joins space is divided between n 1 and n 2 7 6 5 4 3 n 2 n 1 2 1 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 26 7

CAN Example: Two Dimensional Space § Node n 2: (4, 2) joins space is

CAN Example: Two Dimensional Space § Node n 2: (4, 2) joins space is divided between n 1 and n 2 7 6 n 3 5 4 3 n 2 n 1 2 1 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 27 7

CAN Example: Two Dimensional Space § Nodes n 4: (5, 5) and n 5:

CAN Example: Two Dimensional Space § Nodes n 4: (5, 5) and n 5: (6, 6) join 7 6 n 5 n 4 n 3 5 4 3 n 2 n 1 2 1 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 28 7

CAN Example: Two Dimensional Space § § Nodes: n 1: (1, 2); n 2:

CAN Example: Two Dimensional Space § § Nodes: n 1: (1, 2); n 2: (4, 2); n 3: (3, 5); n 4: (5, 5); n 5: (6, 6) Items: f 1: (2, 3); f 2: (5, 1); f 3: (2, 1); f 4: (7, 5); 7 6 n 5 n 4 n 3 5 f 4 4 f 1 3 n 2 n 1 2 f 3 1 f 2 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 29 7

CAN Example: Two Dimensional Space § Each item is stored by the node who

CAN Example: Two Dimensional Space § Each item is stored by the node who owns its mapping in the space 7 6 n 5 n 4 n 3 5 f 4 4 f 1 3 n 2 n 1 2 f 3 1 f 2 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 30 7

CAN: Query Example § § § Each node knows its neighbors in the d-space

CAN: Query Example § § § Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n 1 queries f 4 7 6 n 5 n 4 n 3 5 f 4 4 f 1 3 n 2 n 1 2 f 3 1 f 2 0 0 istoica@cs. berkeley. edu 1 2 3 4 5 6 31 7

Chord § § Associate to each node and item a unique id in an

Chord § § Associate to each node and item a unique id in an uni-dimensional space Properties - Routing table size O(log(N)) , where N is the total number of nodes - Guarantees that a file is found in O(log(N)) steps istoica@cs. berkeley. edu 32

Data Structure § § Assume identifier space is 0. . 2 m Each node

Data Structure § § Assume identifier space is 0. . 2 m Each node maintains - Finger table • Entry i in the finger table of n is the first node that succeeds or equals n + 2 i - Predecessor node § An item identified by id is stored on the succesor node of id istoica@cs. berkeley. edu 33

Chord Example § § Assume an identifier space 0. . 8 Node n 1:

Chord Example § § Assume an identifier space 0. . 8 Node n 1: (1) joins all entries in its finger table are initialized to itself Succ. Table i id+2 i succ 0 2 1 1 3 1 2 5 1 0 1 7 6 2 5 istoica@cs. berkeley. edu 4 3 34

Chord Example § Node n 2: (3) joins Succ. Table i id+2 i succ

Chord Example § Node n 2: (3) joins Succ. Table i id+2 i succ 0 2 2 1 3 1 2 5 1 0 1 7 6 2 Succ. Table 5 istoica@cs. berkeley. edu 4 3 i id+2 i succ 0 3 1 1 4 1 2 6 1 35

Chord Example Succ. Table § Nodes n 3: (0), n 4: (6) join i

Chord Example Succ. Table § Nodes n 3: (0), n 4: (6) join i id+2 i succ 0 1 1 1 2 2 2 4 6 Succ. Table i id+2 i succ 0 2 2 1 3 6 2 5 6 0 1 7 Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 istoica@cs. berkeley. edu 4 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 36

Chord Examples Succ. Table § § Nodes: n 1: (1), n 2(3), n 3(0),

Chord Examples Succ. Table § § Nodes: n 1: (1), n 2(3), n 3(0), n 4(6) Items: f 1: (7), f 2: (2) i id+2 0 1 1 2 2 4 i Items 7 succ 1 2 6 0 1 7 Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table Items i id+2 i succ 1 0 2 2 1 3 6 2 5 6 6 2 Succ. Table 5 4 istoica@cs. berkeley. edu 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 37

Query § § Upon receiving a query for item id, node n Check whether

Query § § Upon receiving a query for item id, node n Check whether the item is stored at the successor node s, i. e. , Succ. Table i id+2 0 1 1 2 2 4 - id belongs to (n, s) § i id+2 i succ 0 7 0 1 0 0 2 2 2 Items 7 succ 1 2 6 0 If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i Succ. Table Items i id+2 i succ 1 0 2 2 1 3 6 2 5 6 1 7 query(7) 6 2 Succ. Table 5 4 istoica@cs. berkeley. edu 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 38

Discussion § Query can be implemented - Iteratively - Recursively § Performance: routing in

Discussion § Query can be implemented - Iteratively - Recursively § Performance: routing in the overlay network can be more expensive than in the underlying network - Because usually there is no correlation between node ids and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! - Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest one in terms of network distance istoica@cs. berkeley. edu 39

Discussion (cont’d) § Gnutella, Napster, Fastrack can resolve powerful queries, e. g. , -

Discussion (cont’d) § Gnutella, Napster, Fastrack can resolve powerful queries, e. g. , - Keyword searching, approximate matching § Natively, CAN, Chord, Pastry and Tapestry support only exact matching - On-going work to support more powerful queries istoica@cs. berkeley. edu 40

Discussion § Robustness - Maintain multiple copies associated to each entry in the routing

Discussion § Robustness - Maintain multiple copies associated to each entry in the routing tables - Replicate an item on nodes with close ids in the identifier space § Security - Can be build on top of CAN, Chord, Tapestry, and Pastry istoica@cs. berkeley. edu 41

Conclusions § § The key challenge of building wide area P 2 P systems

Conclusions § § The key challenge of building wide area P 2 P systems is a scalable and robust location service Solutions covered in this lecture - Naptser: centralized location service - Gnutella: broadcast-based decentralized location service - Freenet: intelligent-routing decentralized solution (but correctness not guaranteed; queries for existing items may fail) - CAN, Chord, Tapestry, Pastry: intelligent-routing decentralized solution • Guarantee correctness • Tapestry (Pastry ? ) provide efficient routing, but more complex istoica@cs. berkeley. edu 42