Estimating Page Rank on Graph Streams Atish Das

  • Slides: 29
Download presentation
Estimating Page. Rank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina

Estimating Page. Rank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

Page. Rank • Page. Rank – Determine Ranking of nodes in graphs • Typically

Page. Rank • Page. Rank – Determine Ranking of nodes in graphs • Typically large graphs - WWW, Social Networks • Run daily by commercial search engines

Page. Rank computation a b u c

Page. Rank computation a b u c

Page. Rank Computation a b u c Our Approach: No Matrix-Vector Multiplication!

Page. Rank Computation a b u c Our Approach: No Matrix-Vector Multiplication!

Our Result Many Random Walk Samples Efficiently. Approximate Page. Rank u

Our Result Many Random Walk Samples Efficiently. Approximate Page. Rank u

Other results from Random Walks G u We can estimate: Mixing Time Conductance Using

Other results from Random Walks G u We can estimate: Mixing Time Conductance Using Streams

Streaming Input is a “stream” e 1, e 2, e 3, e 4, e

Streaming Input is a “stream” e 1, e 2, e 3, e 4, e 5, e 6, e 7, …. Few Passes 010001011 Frequency moments, quantiles 011101011 0100110111 Graphs: Edges, arbitrary order Small RAM working memory 7

Related Work • Sparsifiers (Benczur-Karger 96, Spielman. Teng 01, Spielman-Srivastava 08) – Given an

Related Work • Sparsifiers (Benczur-Karger 96, Spielman. Teng 01, Spielman-Srivastava 08) – Given an undirected graph, produces a sparse one – approximately preserves x’Lx – Can be used to compute sparse cuts • Streaming version of BK 96 (Ahn, Guha 09) ~ – Sparse cuts in 1 pass and O(n) space. • Accelarated Page Rank (Mc. Sherry 08) – heuristics 8

Key Idea l u v One walk from u length l efficiently Later extend

Key Idea l u v One walk from u length l efficiently Later extend to Many walks

Single Random Walk - Naive Algo. s One Step with every Pass! Constant Space

Single Random Walk - Naive Algo. s One Step with every Pass! Constant Space Passes

Second Naive Algo s Single Pass Sample sufficient edges! If , then sample 2

Second Naive Algo s Single Pass Sample sufficient edges! If , then sample 2 out-edges from each node. (store order)

Comparison Naive (single walk): l u Our Result: In fact Automatically: walks!

Comparison Naive (single walk): l u Our Result: In fact Automatically: walks!

Insight: Merge Short Walks Sample w fraction of nodes (centers) w s a length

Insight: Merge Short Walks Sample w fraction of nodes (centers) w s a length walks b w w passes - w w Merge and extend short walks! w Two problems: End up at node second time End up at non-sampled node

Stuck Nodes w w s w Again. And again. . . Slow? w w

Stuck Nodes w w s w Again. And again. . . Slow? w w Sample an edge from stuck. w w If new nodes, good in passes!

Stuck nodes Stuck on same Nodes? w w s s s w w w

Stuck nodes Stuck on same Nodes? w w s s s w w w s Must include to set previous seen centers w w Sample s edges from each w s s s progress OR new node!

Summary w w s s s w w w s w s s •

Summary w w s s s w w w s w s s • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node : center with prob • Amortized progress, every pass

Summary w w s s s w Total number of passes : w w

Summary w w s s s w Total number of passes : w w s w w Total Space : w s s

Summary w Set w s s s w Number of passes = w w

Summary w Set w s s s w Number of passes = w w s w s s Space =

Many Walks Naive Space Bound: w w s s s w We show: w

Many Walks Naive Space Bound: w w s s s w We show: w w s w s s Observation: Many short walks not used in Single RW.

Many Random Walks : probability node ’s short walk used in single RW. •

Many Random Walks : probability node ’s short walk used in single RW. • If known : save lot of space! • Perform K random walks • Total number of short walks required is about • • Don’t know . But can estimate.

Estimating l u • Run K = (log n) walks of length • Gives

Estimating l u • Run K = (log n) walks of length • Gives a crude estimate of • Sufficient to double K • Continue doubling K • Gives K walks in space • Passes

Distributions samples Space Passes u Distribution:

Distributions samples Space Passes u Distribution:

Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference:

Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference: samples. [Batu et. al. ’ 01] – approximate mixing time. • Directed, till distribution “stabilizes”: samples. • Conductance: • Recall space for walks:

Results recap • - Mixing Time for Undirected Graphs : • Quadratic Approximation to

Results recap • - Mixing Time for Undirected Graphs : • Quadratic Approximation to Conductance • Page. Rank to accuracy

Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant

Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant passes. • Graph Cuts and Graph Sparsification for directed graphs • Better (streaming) algorithms for computing eigenvectors

Thank You!

Thank You!

Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample

Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -

Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample

Summary • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass -

Analysis • • • Total number of passes : Total Space : Set Number

Analysis • • • Total number of passes : Total Space : Set Number of passes = Space =