Estimating Page Rank on Graph Streams Atish Das
- Slides: 29
Estimating Page. Rank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
Page. Rank • Page. Rank – Determine Ranking of nodes in graphs • Typically large graphs - WWW, Social Networks • Run daily by commercial search engines
Page. Rank computation a b u c
Page. Rank Computation a b u c Our Approach: No Matrix-Vector Multiplication!
Our Result Many Random Walk Samples Efficiently. Approximate Page. Rank u
Other results from Random Walks G u We can estimate: Mixing Time Conductance Using Streams
Streaming Input is a “stream” e 1, e 2, e 3, e 4, e 5, e 6, e 7, …. Few Passes 010001011 Frequency moments, quantiles 011101011 0100110111 Graphs: Edges, arbitrary order Small RAM working memory 7
Related Work • Sparsifiers (Benczur-Karger 96, Spielman. Teng 01, Spielman-Srivastava 08) – Given an undirected graph, produces a sparse one – approximately preserves x’Lx – Can be used to compute sparse cuts • Streaming version of BK 96 (Ahn, Guha 09) ~ – Sparse cuts in 1 pass and O(n) space. • Accelarated Page Rank (Mc. Sherry 08) – heuristics 8
Key Idea l u v One walk from u length l efficiently Later extend to Many walks
Single Random Walk - Naive Algo. s One Step with every Pass! Constant Space Passes
Second Naive Algo s Single Pass Sample sufficient edges! If , then sample 2 out-edges from each node. (store order)
Comparison Naive (single walk): l u Our Result: In fact Automatically: walks!
Insight: Merge Short Walks Sample w fraction of nodes (centers) w s a length walks b w w passes - w w Merge and extend short walks! w Two problems: End up at node second time End up at non-sampled node
Stuck Nodes w w s w Again. And again. . . Slow? w w Sample an edge from stuck. w w If new nodes, good in passes!
Stuck nodes Stuck on same Nodes? w w s s s w w w s Must include to set previous seen centers w w Sample s edges from each w s s s progress OR new node!
Summary w w s s s w w w s w s s • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node : center with prob • Amortized progress, every pass
Summary w w s s s w Total number of passes : w w s w w Total Space : w s s
Summary w Set w s s s w Number of passes = w w s w s s Space =
Many Walks Naive Space Bound: w w s s s w We show: w w s w s s Observation: Many short walks not used in Single RW.
Many Random Walks : probability node ’s short walk used in single RW. • If known : save lot of space! • Perform K random walks • Total number of short walks required is about • • Don’t know . But can estimate.
Estimating l u • Run K = (log n) walks of length • Gives a crude estimate of • Sufficient to double K • Continue doubling K • Gives K walks in space • Passes
Distributions samples Space Passes u Distribution:
Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference: samples. [Batu et. al. ’ 01] – approximate mixing time. • Directed, till distribution “stabilizes”: samples. • Conductance: • Recall space for walks:
Results recap • - Mixing Time for Undirected Graphs : • Quadratic Approximation to Conductance • Page. Rank to accuracy
Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant passes. • Graph Cuts and Graph Sparsification for directed graphs • Better (streaming) algorithms for computing eigenvectors
Thank You!
Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -
Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -
Analysis • • • Total number of passes : Total Space : Set Number of passes = Space =
- Google page rank algorithm
- Pagerank random walk
- Rankmap
- Page rank centrality
- Google page rank algorithm
- Google page rank algorithm
- Page rank nedir
- Topic specific pagerank
- Pagerank history
- Page rank definition
- Page rank
- Apa 7 running head
- Youtube
- Bill nye rivers and streams answers
- Cost streams
- Oracle streams
- Disappearing streams karst topography
- Yakshi holding a fly whisk
- Sand dune migration
- Cout setf ios fixed
- Data nugget streams as sensors answers
- Broken stream fire fighting
- Illustrate the proper handling of fire streams
- Oracle streams
- Basic concepts in mining data streams
- A framework for clustering evolving data streams
- Streams anu
- Cout.write("objectoriented programming", 6);
- Middle course of a river
- Streams lazy evaluation