Characterizing Unstructured Overlay Topologies in Modern P 2

  • Slides: 18
Download presentation
Characterizing Unstructured Overlay Topologies in Modern P 2 P File-Sharing Systems Daniel Stutzbach –

Characterizing Unstructured Overlay Topologies in Modern P 2 P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Subhabrata Sen – AT&T Labs Internet Measurement Conference Berkeley, CA, USA October 19 th, 2005

Motivation l P 2 P file-sharing systems are very popular in practice. l l

Motivation l P 2 P file-sharing systems are very popular in practice. l l Several million simultaneous users collectively. 60% of all Internet traffic [Cache. Logic Research 2005] Most use an unstructured overlay l Understanding overlay properties is important: l l l Understanding how existing P 2 P systems function Developing and evaluating new systems Unstructured overlays are not well-understood. l We studied overlay properties in Gnutella. l l Size: one of the largest P 2 P systems; more than 1 million users Mature: In use for several years; older studies for comparisons Open: No reverse-engineering needed October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 2/18

Defining the Problem Ultrapeer l Gnutella uses a two-tier overlay. l l l Improves

Defining the Problem Ultrapeer l Gnutella uses a two-tier overlay. l l l Improves scalability. Ultrapeers form an unstructured mesh. Leaf peers connect to the ultrapeers. e. Donkey, Fast. Track are similar. Studying the overlay requires snapshots. l l Top-level overlay Snapshots capture the overlay as a graph. Individual snapshots reveal graph properties. Consecutive snapshots reveal dynamics. Leaf However, capturing accurate snapshots is difficult. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 3/18

Challenges in Capturing Accurate Snapshots l l l Snapshots are captured iteratively by a

Challenges in Capturing Accurate Snapshots l l l Snapshots are captured iteratively by a crawler. An ideal snapshot is instantaneous. But the overlay is large and rapidly changing. Therefore, captured snapshots are distorted. Sampling: l l l Partial snapshots are less distorted, but may be unrepresentative For some types of analysis, the whole graph is needed. Previous studies capture either: l l Complete snapshots slowly, or Partial snapshots. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 4/18

Cruiser: a Fast Gnutella Crawler l Features: l l l Cruiser is orders of

Cruiser: a Fast Gnutella Crawler l Features: l l l Cruiser is orders of magnitude faster. l l l Distributed, highly parallelized implementation Dynamic adaptation to bandwidth and CPU constraints Captures one million nodes in around 7 minutes 140, 000 peers/min, compared to 2, 500 peers/min [Saroiu 02] We investigated the effects of speed on distortion. l Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots of the Gnutella Network”, the Global Internet Symposium, March, 2005. l 4% node distortion 15% edge distortion l October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 5/18

Data Set l More than 80, 000 snapshots, over the past year. l To

Data Set l More than 80, 000 snapshots, over the past year. l To examine static properties, we focus on four: Date Total Nodes Leaves Ultrapeers Top-level Edges 9/27/04 725, 120 614, 912 110, 208 1, 212, 772 116, 967 1, 244, 219 10/11/04 779, 535 662, 568 l 10/18/04 To examine dynamic properties, 806, 948 686, 719 we use slices: 120, 229 1, 331, 745 l Each slice is 2 days of 873, 130 ~500 back-to-back 2/2/05 1, 031, 471 158, 345 snapshots 1, 964, 121 l Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04, and 12/27/04 October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 6/18

Summary of Characterizations l Graph Properties l l Implementation heterogeneity Degree Distribution: l l

Summary of Characterizations l Graph Properties l l Implementation heterogeneity Degree Distribution: l l l Top-level degree distribution Ultrapeer-leaf connectivity Degree-distance correlation Reachability: l l l Dynamic Properties l Existence of stable core: l l l Uptime distribution Biased connectivity Properties of stable core: l l l Largest connected component Path lengths Clustering coefficient Path lengths Eccentricity Small world properties Resiliency October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 7/18

Top-level Degree Max 30 in most clients Max 75 in some clients Custom l

Top-level Degree Max 30 in most clients Max 75 in some clients Custom l l This is the degree distribution among ultrapeers. There are obvious peaks at 30 and 70 neighbors. A substantial number of ultrapeers have fewer than 30. What happened to the power-law seen in prior studies? October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 8/18

What happened to power-law? [Ripeanu 02 ICJ] When a crawl is slow, many short-lived

What happened to power-law? [Ripeanu 02 ICJ] When a crawl is slow, many short-lived peers report long-lived peers as neighbors. l However, those neighbors are not all present at the same time. l Degree distribution from a slow crawl resembles prior results. l October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 9/18

Shortest-Path Distances l l l Distribution of distances among ultrapeers and among all peers

Shortest-Path Distances l l l Distribution of distances among ultrapeers and among all peers In the top-level, 70% of distances are exactly 4 hops. Across all peers, most distances are 5 or 6 hops. l l Shows the effect of the two-tier with multiple parents Despite large size, distances are short. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 10/18

Is Gnutella a Small World? l Small worlds arise naturally in many places. l

Is Gnutella a Small World? l Small worlds arise naturally in many places. l l Movies actors, power grid, co-authors of papers They have short distances, but significant clustering, compared to a similar random graph. Mean Clustering Distance Coefficient l Gnutella 4. 2 0. 018 Random 3. 8 0. 00038 Conclusion: Gnutella is a small world. l l Very high clustering adversely affects flooding queries But Gnutella isn’t clustered enough to affect performance. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 11/18

Resiliency to Node Failure Random Highest degree first l l After removing nodes, this

Resiliency to Node Failure Random Highest degree first l l After removing nodes, this figure shows how many remain connected. The Gnutella topology is extremely resilient to random node failure. It’s resilient even when the highest-degree nodes are removed first. Complex algorithms are not necessary for ensuring resilience. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 12/18

What about Dynamic Properties? Prior work suggests many peers are short-lived while others are

What about Dynamic Properties? Prior work suggests many peers are short-lived while others are very long-lived. l How do these nodes interact? l Methodology: l l l Capture a long series of back-to-back snapshots Annotate the last snapshot with the uptime of each peer Examine the properties of the annotated topology Group peers by uptime Present for 5 snapshots Present for 2 snapshots Departed peer Newly arrived peer Time October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 13/18

Stable Core > 20 h l l l Most peers are recent arrivals. >

Stable Core > 20 h l l l Most peers are recent arrivals. > 10 h Other peers have been around for a long time. We can select a set of peers based on a minimum uptime threshold. We call this the stable core. Does the longevity of a peer affect who its neighbors are? October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 14/18

Biased Connectivity l Hypothesis: long-lived nodes tend to be more connected to other long-lived

Biased Connectivity l Hypothesis: long-lived nodes tend to be more connected to other long-lived nodes l l l Rationale: Once connected, they stay connected. The longer they’re around, the more opportunities they have to neighbor. Approach: Check for biased connectivity l l l Randomize the edges to create a graph without biased connectivity Then compare Are there more edges in the observed stable core compared to random? October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 15/18

Stable Core Edges 20%— 40% more edges in the stable core compared to random.

Stable Core Edges 20%— 40% more edges in the stable core compared to random. There is an onion-like bias where long-lived peers are more likely to be connected to other long-lived peers. l We examined other properties of the stable core. l Despite high churn, there is a relatively stable “backbone”. l l October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 16/18

Summary Characterizations of recent and accurate snapshots l Graph properties: l l l The

Summary Characterizations of recent and accurate snapshots l Graph properties: l l l The degree distribution in Gnutella is not power law. Gnutella exhibits small world characteristics. Gnutella is resilient. Dynamic properties: l l l There is a stable core within the topology Peer churn causes the stable core to have an onion-like shape. This effect is likely to occur in any unstructured system. October 19 th, 2005 http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 17/18

Future Work l Examining long-term trends in Gnutella using many snapshots. l Characterizing churn

Future Work l Examining long-term trends in Gnutella using many snapshots. l Characterizing churn l Characterizing properties of other widelydeployed P 2 P systems Kad (a DHT with more than 1 million users) l Bit. Torrent l l Developing October 19 th, 2005 sampling techniques for P 2 P http: //mirage. cs. uoregon. edu/P 2 P IMC 2005 Slide 18/18