Improving Performance in the Gnutella Protocol Jonathan Hess

  • Slides: 22
Download presentation
Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of California at

Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of California at Berkeley Department of Computer Science Cs 294 -4 Peer-to-Peer Systems 1

Outline n n n Background Motivation Solution n n Mirroring Directed Search Results Possible

Outline n n n Background Motivation Solution n n Mirroring Directed Search Results Possible Future Work Cs 294 -4 Jonathan Hess | Benjamin Poon 2

Background n Gnutella n n Protocol for distributed search No centralization Searches through query

Background n Gnutella n n Protocol for distributed search No centralization Searches through query flooding Opponents n Cs 294 -4 Censorship + threatening of Gnutella users Jonathan Hess | Benjamin Poon 3

Motivation Opponents cause participation causes replication of shared files 1. 2. Same files being

Motivation Opponents cause participation causes replication of shared files 1. 2. Same files being shared, but not as many copies n replication causes 3. n n è Cs 294 -4 workload for sharing peers Need for deeper query depths Overall decrease in performance Jonathan Hess | Benjamin Poon 4

Solution n Improve performance given decreased participation n n Cs 294 -4 Mirroring Directed

Solution n Improve performance given decreased participation n n Cs 294 -4 Mirroring Directed Search Jonathan Hess | Benjamin Poon 5

Mirroring – Main Idea Achieve more replication by copying file to a willing peer

Mirroring – Main Idea Achieve more replication by copying file to a willing peer (a mirror) Only replicate on demand Preserve blame on original sharer of file n n Cs 294 -4 i. e. , mirrors should retain plausible deniability despite sharing the file Jonathan Hess | Benjamin Poon 6

Mirroring Request Messages n Mirror requestor (originator) sends Mirroring Request Message (MRM) to find

Mirroring Request Messages n Mirror requestor (originator) sends Mirroring Request Message (MRM) to find a client to act as mirror n n No need to flood n n Clients pass MRM’s only on one randomly chosen outgoing connection MRMTTL should be relatively high n n MRM(header, listening. Port, file. Index) Prevents people from intercepting query traffic to see what file is Con: originator must stay in network in order for mirroring to occur Cs 294 -4 Jonathan Hess | Benjamin Poon 7

Mirroring – Sending MRMs Procedure per client sharing n files F 1…Fn n 1.

Mirroring – Sending MRMs Procedure per client sharing n files F 1…Fn n 1. 2. Record demand Di (# uploads) for locally shared file Fi When Di > mirror. Threshi, request a mirror n 3. Having a new mirror means we shouldn’t create additional mirror as readily n Cs 294 -4 Send MRM on one random outbound connection mirror. Threshi += thresh. Increment Jonathan Hess | Benjamin Poon 8

Mirroring – Receiving MRMs Mirror M sends file transfer request for MRM. file. Index

Mirroring – Receiving MRMs Mirror M sends file transfer request for MRM. file. Index to originator O O receives request for file. Index O adds M to its list of mirrors of file. Index O sends M encrypted file associated with 1. 2. 3. 4. file. Index n n Cs 294 -4 Preserves plausible deniability for mirror Con: still a possibility for a client to figure out what original file was – how? Jonathan Hess | Benjamin Poon 9

Mirroring – Using Mirrors Procedure for originator of MRMs n n If originator has

Mirroring – Using Mirrors Procedure for originator of MRMs n n If originator has enough bandwidth n n If not enough bandwidth n n Check if there are mirrors for file. Index If no mirrors n n Serve files Proceed according to original Gnutella protocol If has mirrors n Multiplex requests over set of mirrors M 1. . . Mx n Cs 294 -4 Send Query. Hits as if they were from Mi (1 <= i <= x) containing the decryption key Jonathan Hess | Benjamin Poon 10

Directed Search – Motivation n As the ratio of free-loaders to serving peers increases,

Directed Search – Motivation n As the ratio of free-loaders to serving peers increases, search moves towards needle-in-a -haystack n n n Cs 294 -4 Flood excels at finding piles of hay Much research effort has gone into successive deepening and file indexing Directed search is not as well understood Jonathan Hess | Benjamin Poon 11

Directed Search – Main Idea n n Pay a one time up front cost

Directed Search – Main Idea n n Pay a one time up front cost for a bloom filter broadcast Nodes within N hops merge filter into a collection associated with each edge n n Collection is depth aware Upon receiving a query, forward message to n edges with highest scores Cs 294 -4 Jonathan Hess | Benjamin Poon 12

Directed Search n n Query reaches nquery nodes n may be much smaller than

Directed Search n n Query reaches nquery nodes n may be much smaller than out-degree and query. TTL can be larger than normal TTLs n n n Cs 294 -4 nquery. TTL < out-degree. TTL Reach more and better users Avoid free-loaders Jonathan Hess | Benjamin Poon 13

Results n Simulation: Bloom. Net n Models real-world Gnutella network as close as possible

Results n Simulation: Bloom. Net n Models real-world Gnutella network as close as possible n n Uses statistics from many previous measurement studies of Gnutella networks File sharing/requesting n n n Cs 294 -4 Master filename list of 5072 files Each client chooses to share certain number of files from master list Queries generated by taking a random filename at most once from master list according to modified Zipf distribution (à la Efficient search in peer-to-peer networks, B. Yang, H. Garcia-Molina) Jonathan Hess | Benjamin Poon 14

Results – Overview n Advantages n Bloom. Net finds hits better than Gnutella n

Results – Overview n Advantages n Bloom. Net finds hits better than Gnutella n n Uses approximately 3 x less query bandwidth As network size increases n n Bloom. Net achieves higher % successful queries than Gnutella n n Gap in performance increases Uses approximately 3 x less query bandwidth Disadvantages n 20% more total bandwidth used to run Bloom. Net n Cs 294 -4 Can be improved using different Bloom parameters Jonathan Hess | Benjamin Poon 15

Results – Query Success Cs 294 -4 Jonathan Hess | Benjamin Poon 16

Results – Query Success Cs 294 -4 Jonathan Hess | Benjamin Poon 16

Results – Query Bandwidth Cs 294 -4 Jonathan Hess | Benjamin Poon 17

Results – Query Bandwidth Cs 294 -4 Jonathan Hess | Benjamin Poon 17

Results – Total Bandwidth Cs 294 -4 Jonathan Hess | Benjamin Poon 18

Results – Total Bandwidth Cs 294 -4 Jonathan Hess | Benjamin Poon 18

Possible Future Work n Mirroring n n More sophisticated demand realization techniques – gossiping

Possible Future Work n Mirroring n n More sophisticated demand realization techniques – gossiping protocols? Directed Search n n n Cs 294 -4 Only highly-connected peers exchange Bloom Filters Better score functions for edge selection Better understanding of filter merging Jonathan Hess | Benjamin Poon 19

Questions Cs 294 -4 Jonathan Hess | Benjamin Poon 20

Questions Cs 294 -4 Jonathan Hess | Benjamin Poon 20

Cs 294 -4 Jonathan Hess | Benjamin Poon 21

Cs 294 -4 Jonathan Hess | Benjamin Poon 21

Simulation Parameters n n n Clients Bloom Depth Bloom Size Ping TTL Query TTL

Simulation Parameters n n n Clients Bloom Depth Bloom Size Ping TTL Query TTL Mirror TTL Cs 294 -4 1024 3 -4 384 -3072 5 5 -7 15 Jonathan Hess | Benjamin Poon 22