Hybrid Search Schemes for Unstructured PeertoPeer Networks Random

“Hybrid Search Schemes for Unstructured Peerto-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan 1 February 28 th, 2007

“Hybrid Search Schemes for Unstructured Peerto-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi 2

Outline • Random Graph Models • Flooding and Normalization • Random Walks and Replication • Generalized Search Schemes • Experimental evaluation 3

Motivation • Flooding + small time-to-live (TTL) performs well in regular graphs • Performance metric: number of exchanged messages/distinct response • Its performance decreases: when TTL increases or for irregular networks • Random Walk performs better than flooding • scalability, granularity • Hybrid + Generalized search schemes: • Random Walks with lookahead, Random Walks with 1 -step replication 4

Contribution • Random walks (RW) with shallow flooding offer good performance (analytic justification) R 1: In a random graph model with O(n) nodes of constant degree and O(n 1/2) nodes of degree O(n 1/2) the expected time to discover Ω(n) is O(n 1/2). R 2: Random Walks with look-ahead 1 or 1 -step replication perform better when there is discrepancy on the degrees of the underlying topology. • Normalized Flooding (NF) solution R 3: NF achieves comparable performance to flooding in regular graphs. R 4: NF with 1 -step replication achieves performance comparable to RW 5

Random Graph Models • Random Regular Graphs – Gn, d represents a graph with n nodes and each node is of degree d. Gn, d has a sum of degree D = nd. • Random Graphs with super-nodes - Gn, d, α, β Given α and β constants, Gn, d, α, β denotes a graphs with αn 1/2 of degree βn 1/2 (i. e. large vertices) and the remaining nodes of degree d (i. e. small vertices). Gn, d, α, β has a sum of degree D = (αβ+d)n. 6

Flooding and Normalization • Theorem 3. 1. : Let us consider Gn, d random regular graph, flooding scenario from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2 Claims: (1) (2) (3) 7

(1) • Proof: 8

(2) • Proof: 9

10

(3) • Proof: 11

Flooding and Normalization • Theorem 3. 2. : Let Gn, d, α, β be a random graph with supernodes and a flooding scenario from node v of degree d with time-tolive τ. Claim: For some τ = O(log n), the number of distinct responses is Ω(n). Proof: Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1. Assumption: this set (of visited nodes) doesn’t contain a large degree vertex. From d-regular graphs we know that this set contains at least (d 1)τ-1 edges. 12

Flooding and Normalization • Theorem 3. 3. : Let Gn, d, α, β be a random graph with supernodes, a normalized flooding scenario from node v with TTL. Then the number of distinct responses is Ω((d - 1)τ-1) and the number of messages per response is O(1). Proof: From Theorem 3. 1. the number of minigroups seen is (d - 1)τ-1 The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ) Let Xi, i = 1, …, N be random variables with P[ Xi=1]=pi and P[Xi=0]=1 -pi 13

Random Walks and Replication • Random Walk with Look-Ahead: • a random walk with shallow flooding on each step of the walk • RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2)) • Theorem 4. 2. : Let Gn, d, α, β be a random graph with supernodes and consider a random walk from a node v. Then, in 1 -step replication scenario, the expected number of messages and response time to obtain distinct responses is 14

• Theorem 4. 3. : Let Gn, d, α, β be a random graph with supernodes and consider Normalized flooding from v with TTL τ ≈ (log n)/(2*log(d-1)). Then, in 1 -step replication scenario, the number of distinct responses is at least and the number of messages is at most Proof: The number of minigroups seen is (d - 1)τ – 1 and using the Chernoff bounds there will be vertices. minigroups corresponding to large 15

Generalized Search Schemes • Searching procedure: • A node of degree d initiates a search based on a budget k budget = number of messages that are propageted in the network • Among its d neighbors the node picks certain quantities k 1, k 2, …, kd such that k 1 + k 2 + … + kd = k • For every neighbor i the master node forwards the message with budget ki ( for ki = 0 the message is not transmitted) • Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0 • Every node that receives the message for the second yime from another neighbor forwards the message with 16 the corresponding budget

Experimental Evaluation • Methodology – Performance Metrics • • • Median and Mean number of distinct peers discovered (hits) Minimum, Maximum, Standard Deviation of the number of hits Number of messages Granularity of number of messages Response time – Topologies • • Random d-Regular Graphs Power Law Graphs Bimodal topologies Clustered topologies 17

Normalized Flooding (NF) • Mean number of unique peers discovered as a function of the initial TTL • NF and Standard Flooding behave similarly in Regular Graphs • NF controls the number of messages and provides higher efficiency 18

Normalized Flooding (NF) • The number of unique peers increases exponentially with TTL in NF case • The number of peers increases faster than exponentially with TTL in topologies with high degrees 19

Random Walk with 1 -step replication 20

Random Walk with Look. Ahead (RWLA) • RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered) • RWLA response time is much smaller compared to standard RW 21

Edge Criticality & Searching with weights • Generalized Searching performs similarly to Standard Flooding in regular graphs • Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used. 22

Conclusions • Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs • RW with 1 -step replication performs better than RW and NF in irregular graphs • Open for improvements: • Generalized schemes (analytic investigation) • Quantifying Directional flooding 23

“Random Walks in Peer-to-Peer (P 2 P) Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi 24

Outline • Motivation • Statistical Estimation and Random Walks (RW) • Searching • Methodology and Topologies importance • Construction and Summary 25

Motivation • Random Walks (RW) were proposed for constructing searching and topology maintenance protocols in P 2 P networks • RW improve searching performance as compared to flooding (Cao et al. , 2002) • A RW approach to constructing and maintaining unstructured topologies provides good connectivity properties (i. e. constant degree, constant expansion) • Claim: RW approach is a good candidate • to simulate uniform sampling • the number of simulation steps required can be as low as the number of samples in independent uniform sampling • Searching and Overlay Topology Construction 26 • RW searching performs better than flooding for the same number

Statistical Estimation & Random Walks • Coupon collection and Chernoff bounds • n - type of coupons & each time one is drawn (uniformly distributed) • Tn - time by which we extracted coupons belonging to all n types • Tαn - time by which we encountered αn distinct types, 0 < α <1 • X 1, …, Xk independent Bernoulli trials, P[Xi=1]=pi and P[Xi=0]=1 -pi • p - probability that a random drawn object has a particular 27 property

Statistical Estimation & Random Walks • Random Walks (RW), Convergence and Cover Time • G = (V, E) undirected graph, |V| = n, and di- degree of vertex I • Aij - adjacency matrix, P - transition matrix which satisfies • f: V→{0, 1} which satisfies • Convergence rate metric - the rate at which the RW approaches the stationary distribution • Cover time metric - the time by which all nodes were visited 28 • Trajectory sample average - the rate at which the value of

Statistical Estimation & Random Walks • Convergence rate is related to the second eigenvalue of P (1) • yt – the vertex that the RW visited at time t • Cover time (2) • Trajectory sample average (1) : [ 11], (2) : [ 12, 13] , (3) : [ 3, 4, 5, 6] 29

Statistical Estimation & Random Walks • Second Eigenvalue, Expansion and Conductance • S subset of V, C(S) cutset of V (i. e. edges with one point in S and the other one in VS), vol(S) (i. e. the sum of degrees of vertices in S) • Expansion • Conductance • Known bound [ 11, 14, 15, 16, 17, 18, 19] 30

Searching • Performance metrics for Flooding and RW • average number of distinct copies of an item located in the search • number of messages used by the searching algorithm • RW performs better than flooding if • multiple search requests for the same item with slowchanging topology • peer clustering ( see [20, 21, 22, 23, 24, 25] for details) • Searching analysis • • Methodology Flat topologies with Uniformly Distributed Content Topologies with Peer Clustering Re-issuing the Same Query 31

Searching - Methodology • Performance Metrics • mean of the number of distinct copies (i. e. Mean) • discrepancy around the mean (i. e. Std) and the failure probability • Cost • number of messages or queries performed during search • Peer-to-peer topologies ( ≈ 1 million nodes) • Flat regular expanders, Two tier topologies with clustering, Power law graphs, Samples from real topologies • Dynamic topologies • rewiring • Content placement 32

Searching – Flat Topologies • Experiment: • one request in a network of 500 K peers • Mean hits, Minimum # of hits and Std are similar for Flooding and RW • the entire distribution of hits is similar for Flooding and RW 33

Searching -Topologies with Peer Clustering • Cluster topology consists of • 5 flat regular graphs of size 40 K; from each one pick randomly 1000 nodes to construct another flat regular graph • Number of hits for RW is more concentrated around the mean compared to Flooding 34

• Searching - Reissuing the Same Query Experiment setup – repeat 4 times the below procedure • each peer sends a request and waits for response • between requests 2% of the links are rewired • each peer initiates a new searching • RW have better performance than Flooding • Mean Hits and Failure Probability 35

Searching - Reissuing the Same Query • Performance of successive searches depends • on the number of topology changes considered between consecutive searches • Performance of Flooding increases as the rate of topological changes increases • RW Performance remains the same for small variations 36

Searching – Real Topologies • The number of hits for RW is more concentrated around the mean than in Flooding • P 2 P have good expansion properties 37

Construction • P 2 P network construction concerns with: • peers arrive and leave the network dynamically • strong and weak decentralization • low network overhead per addition or deletion 38

Baseline Construction of Expander Graphs • ABASE (undirected graph) consists of: • n vertices where each one chooses randomly d vertices • total number of edges = nd and expected vertex degree = 2 d • Theorem 4. 1. Let G(V, E) a graph constructed by ABASE. Then, G is an expander with high probability and for positive constant α < 1 39

Baseline Construction of Expander Graphs with Constant Overhead in Random Bits • A’BASE construction algorithm: • start a RW at a random vertex on H (constant degree expander graph) • when ABASE needs a random number this is taken from the RW on H • Theorem 4. 2. Let G(V, E) a graph constructed by A’BASE. There are positive constants α, 0 < β < 0. 5 such that any subset S of at least β|V| and at most 0. 5|V| has 40 cutset

Distributed Construction of Expanders with Constant Overhead on Network Resources • A’H – construction • d daemons , one for each Hamilton cycle • a new arriving node, it contacts the daemon associated with the i-th Hamilton cycle • it attaches after c number of steps between the peer that currently hosts daemon i and one of its neighbors in the cycle i 41

Distributed Construction of Expanders with Constant Overhead on Network Resources • A’M – construction • d daemons , one for each Hamilton cycle • the arrival of a new arriving node consists of two X and Y nodes; X and Y contact the central server to discover the location of the d daemons • X becomes the neighbor of daemon i and Y the neighbor of the initial daemon’s neighbor 42

Summary • For Searching • Random Walks (RW) are superior to Flooding • For Construction • RW add new peers with constant overhead • Open Problems • Strong Decentralized Construction algorithm • Can we handle better deletions and expansions of small sets? • How the P 2 P network parameters (e. g. capacities) affect the performance of RW? 43