A look at PeertoPeer File Sharing with Gnutella

  • Slides: 35
Download presentation
A look at Peer-to-Peer File Sharing with Gnutella Prof. Ellis Horowitz November 25, 2002

A look at Peer-to-Peer File Sharing with Gnutella Prof. Ellis Horowitz November 25, 2002

Outline • • P 2 P file sharing clients Gnutella protocol Gnutella network properties

Outline • • P 2 P file sharing clients Gnutella protocol Gnutella network properties Gnutella protocol issues – topolgy mismatch – scalability – free riding – query types – anonymity – security • Conclusions

Peer-to-Peer File Sharing is all about the trading of copyrighted music and videos without

Peer-to-Peer File Sharing is all about the trading of copyrighted music and videos without paying anything to the authors query music category Kazaa Native Windows Application banner ad 3 million users online sharing 4 Peta. Bytes of data

Kazaa Survives By Legal Manuvering • • March 2001, Kazaa is founded by two

Kazaa Survives By Legal Manuvering • • March 2001, Kazaa is founded by two Dutchmen, Niklas Zennstrom and Janus Friis in a company called Computer Empowerment The software is based upon their Fast. Track P 2 P Stack, a proprietary algorithm for peer-to-peer communication Kazaa licenses Fast. Track to Morpheus and Grokster Oct. 2001 MPAA and RIAA sue Kazaa, Morpheus and Grokster Nov. 2001, Consumer Empowerment is sued in the Netherlands by the Dutch music publishing body, Buma/Stemra. The court orders Ka. Za. A to take steps to prevent its users from violating copyrights or else pay a heavy fine. Jan. 2002, Zennstrom&Friis sell Kazaa software and website to Sharman Networks, based in Vanuatu, an island in the Pacific, but operating out of Australia Feb. 2002, Kazaa cuts off Morpheus clients from Fast. Track April 2002, Sharman Networks agrees to let Brilliant Digital bundle their own stealth P 2 P application called Alt. Net within Ka. Za. A. This network would be remotely switched on, allowing Ka. Za. A users to trade Brilliant Digital content throughout Fast. Track

Morpheus File Sharing Software shopping, web browser a Java application Search Power searches over

Morpheus File Sharing Software shopping, web browser a Java application Search Power searches over multiple categories, metadata banner ad behind a firewall Morpheus adopts the Jtella version of Gnutella

There are many Gnutella Clients See http: //gnutella. wego. com/

There are many Gnutella Clients See http: //gnutella. wego. com/

Gnutella History • Originally conceived of by Justin Frankel, 21 year old founder of

Gnutella History • Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft • March 2000, Nullsoft posts Gnutella to the web • A day later AOL removes Gnutella at the behest of Time Warner • The Gnutella protocol version 0. 4 http: //www 9. limewire. com/developer/gnutella_protocol_0. 4. pdf and version 0. 6 http: //rfcgnutella. sourceforge. net/Proposals/Ultrapeers. htm • there are multiple open source implementations at http: //sourceforge. net/ including: – Jtella – Gnucleus • Software released under the Lesser Gnu Public License (LGPL) • the Gnutella protocol has been widely analyzed

Gnutella Protocol Messages • Broadcast Messages – Ping: initiating message (“I’m here”) – Query:

Gnutella Protocol Messages • Broadcast Messages – Ping: initiating message (“I’m here”) – Query: search pattern and TTL (time-to-live) • Back-Propagated Messages – Pong: reply to a ping, contains information about the peer – Query response: contains information about the computer that has the needed file • Node-to-Node Messages – GET: return the requested file – PUSH: push the file to me

Gnutella search mechanism Steps: • Node 2 initiates search for file A 7 1

Gnutella search mechanism Steps: • Node 2 initiates search for file A 7 1 A 4 2 6 3 5

Gnutella Search Mechanism A Steps: • Node 2 initiates search for file A •

Gnutella Search Mechanism A Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 3 A 6 A 5

Gnutella Search Mechanism A A Steps: • Node 2 initiates search for file A

Gnutella Search Mechanism A A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 A 5 A

Gnutella Search Mechanism A: 7 A 7 1 4 2 6 3 A: 5

Gnutella Search Mechanism A: 7 A 7 1 4 2 6 3 A: 5 A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message

Gnutella Search Mechanism 7 1 4 2 3 A: 7 A: 5 A 6

Gnutella Search Mechanism 7 1 4 2 3 A: 7 A: 5 A 6 A 5 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated

Gnutella Search Mechanism 7 1 A: 7 2 4 A: 5 6 3 5

Gnutella Search Mechanism 7 1 A: 7 2 4 A: 5 6 3 5 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated

Gnutella Search Mechanism download A 1 7 4 2 6 3 5 Steps: •

Gnutella Search Mechanism download A 1 7 4 2 6 3 5 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated • File download • Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y

Other Gnutella Issues • GUID: Short for Global Unique Identifier, a randomized string that

Other Gnutella Issues • GUID: Short for Global Unique Identifier, a randomized string that is used to uniquely identify a host or message on the Gnutella Network. This prevents duplicate messages from being sent on the network. • GWeb. Cache: a distributed system for helping servents connect to the Gnutella network, thus solving the "bootstrapping" problem. Servents query any of several hundred GWeb. Cache servers to find the addresses of other servents. GWeb. Cache servers are typically web servers running a special module. • Host Catcher: Pong responses allow servents to keep track of active gnutella hosts • On most servents, the default port for Gnutella is 6346

Network growth statistics Growth Factors § DSL and cable modem nodes grew substantially §

Network growth statistics Growth Factors § DSL and cable modem nodes grew substantially § Multiple client implementations became available § There was significant growth in the Gnutella network in 2001 § 5, 000 nodes on February 2001, § 10, 000 nodes on March 19, 2001 Statistics due to Matei Ripeanu, see http: //people. cs. uchicago. edu/~matei/ § 20, 000 nodes on May 12, 2001 PAPERS/gnutella-rc. pdf § 40, 000 nodes on May 29, 2001

Limewire Count of Gnutella Hosts in 2002 Green graph represents unique hosts

Limewire Count of Gnutella Hosts in 2002 Green graph represents unique hosts

Growth invariants (1): avg. node connectivity § 3. 4 links per node on average

Growth invariants (1): avg. node connectivity § 3. 4 links per node on average graph due to Matei Ripeanu

Growth invariants (2): network diameter § Node-to-node distance maintains similar distribution § Average node-to-node

Growth invariants (2): network diameter § Node-to-node distance maintains similar distribution § Average node-to-node distance grew 25% while the network grew 50 times over 6 months graph due to Matei Ripeanu

Is Gnutella a power-law network? Power-law networks: the number of links per node follows

Is Gnutella a power-law network? Power-law networks: the number of links per node follows a power-law distribution November 2000 Examples: § the Internet, § in/out links to/from HTML pages, § citation network, § US power grid graph due to Matei Ripean Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

Total Generated Traffic Ripeanu has determined that Gnutella traffic totals 1 Gbps (or 330

Total Generated Traffic Ripeanu has determined that Gnutella traffic totals 1 Gbps (or 330 TB/month)! – Compare to 15, 000 TB/month in US Internet backbone (Dec. 2000) – this estimate excludes actual file transfers Reasoning: § QUERY and PING messages are flooded. They form more than 90% of generated traffic § predominant TTL=7 § >95% of nodes are less than 7 hops away § measured traffic at each link about 6 kbs § network with 50 k nodes and 170 k links Statistics due to Matei Ripeanu

Mapping between Gnutella Network and Internet Infrastructure A B F D E C G

Mapping between Gnutella Network and Internet Infrastructure A B F D E C G H Perfect Mapping

Mismatch between Gnutella Network and Internet Infrastructure A B F D C E G

Mismatch between Gnutella Network and Internet Infrastructure A B F D C E G H • Inefficient mapping • Link D-E needs to support six times higher traffic.

Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! §

Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! § 40% of all nodes are in the 10 largest Autonomous Systems (AS) § Only 2 -4% of all TCP connections link nodes within the same AS § Largely ‘random wiring’ • Most Gnutella generated traffic crosses AS border, making the traffic more expensive • May cause ISPs to change their pricing scheme

Scalability • Whenever a node receives a message, (ping/query) it sends copies out to

Scalability • Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. • existing mechanisms to reduce traffic: – TTL counter – Cache information about messages they received, so that they don't forward duplicated messages.

Free Riding on Gnutella • • • 70% of Gnutella users share no files

Free Riding on Gnutella • • • 70% of Gnutella users share no files 90% of users answer no queries Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. see Adar and Huberman at http: //www 2. cs. cmu. edu/~kunwadee/res earch/p 2 p/gnutella. html

Free Riding on Gnutella More than 25% of Gnutella clients share no files; 75%

Free Riding on Gnutella More than 25% of Gnutella clients share no files; 75% share 100 files or less Conclusion: Gnutella has a high percentage of free riders * Statistics due to S. Gribble

Anonymity • Gnutella provides for anonymity by masking the identity of the peer that

Anonymity • Gnutella provides for anonymity by masking the identity of the peer that generated a query. • However, IP addresses are revealed at various points in its operation: HITS packets includes the URL for each file, revealing the IP addresses • Clients claim that they have no control, but. . – they support bootstrapping – they may control message flow – they may control metadata searches – they may control program updates

Query Expressiveness • Format of query not standardized • No standard format or matching

Query Expressiveness • Format of query not standardized • No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. • String literal vs. regular expression • Directory name, filename, or file contents • Malicious users may even return files unrelated to the query

Gnutella Queries • "The popularity of Gnutella queries and its implications on scalability" Kunwadee

Gnutella Queries • "The popularity of Gnutella queries and its implications on scalability" Kunwadee Sripanidkulchai, see http: //www 2. cs. cmu. edu/~kunwadee/research/p 2 p/gnutella. html Examining over 5 million queries

Security • Recently there have been P 2 P viruses and worms constructed –

Security • Recently there have been P 2 P viruses and worms constructed – the Benjamin virus uses Kazaa to spread itself, see http: //www. viruslist. com/eng/viruslist. html? id=49 790 • Kazaa now includes virus checking software that is applied before upload/after download • There have been several Gnutella worms: Gnutella. worm, VBS/GWV. a, VBS_GNUTELWORM, VBS. Gnut. A, VBS/Gnu • A Gnutella worm spreads by making a copy of itself in the Gnutella program directory, then making that directory available for sharing files on the Gnutella network.

Conclusions § Gnutella is a self-organizing, large-scale, P 2 P application that produces an

Conclusions § Gnutella is a self-organizing, large-scale, P 2 P application that produces an overlay network on top of the Internet; it appears to work § Growth is hindered by the volume of generated traffic and inefficient resource use § since there is no central authority the open source community must commit to making any changes § Suggested changes have been made by – Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei Ripeanu – Improving Gnutella Protocol: Protocol Analysis and Research Proposals by Igor Ivkovic

Legal Questions • Do US courts have jurisdiction over P 2 P companies? •

Legal Questions • Do US courts have jurisdiction over P 2 P companies? • Do P 2 P companies really contribute to copyright infringement, cite: Sony Beta. Max case? • Do P 2 P companies affect file sharing? • If Kazaa, Grokster and Morpheus are stopped, will that stop file sharing or copyright infringement?

Some References • • [1] Eytan Adar and Bernardo A. Huberman, Free Riding on

Some References • • [1] Eytan Adar and Bernardo A. Huberman, Free Riding on Gnutella http: //www. firstmonday. dk/issues/issue 5_10/adar/ [2] Igor Ivkovic, Improving Gnutella Protocol: Protocol Analysis And Research Proposals http: //www 9. limewire. com/download/ivkovic_paper. pdf • [3] Jordan Ritter, Why Gnutella Can't Scale. No, Really. http: //www. monkey. org/~dugsong/mirror/gnutella. html • [4] Matei Ripeanu, Peer-to-Peer Architecture Case Study: Gnutella network. http: //www. cs. uchicago. edu/%7 Ematei/PAPERS/gnutella-rc. pdf • [5] The Gnutella Protocol Specification v 0. 4 http: //www 9. limewire. com/developer/gnutella_protocol_0. 4. pdf