Stochastic Streams Sample Complexity vs Space Complexity David

  • Slides: 23
Download presentation
Stochastic Streams: Sample Complexity vs. Space Complexity David Woodruff IBM Almaden Joint work with

Stochastic Streams: Sample Complexity vs. Space Complexity David Woodruff IBM Almaden Joint work with Michael Crouch, Andrew Mc. Gregor, and Greg Valiant

Motivation • (Well-studied) Statistics question: how many samples from a distribution are needed to

Motivation • (Well-studied) Statistics question: how many samples from a distribution are needed to estimate a property of a distribution? • (Well-studied) Streaming question: for a given fixed stream of samples, how much space is needed to estimate a property of a distribution? • Our work: understand the tradeoff between the sample and space complexity

Model 4 3 7 3 1 1 2 … • Algorithm sees a stream

Model 4 3 7 3 1 1 2 … • Algorithm sees a stream of i. i. d. samples from a distribution • Algorithm only has 1 pass over the samples • Goal: understand the tradeoff between the number t of samples needed to solve a problem, versus the space s of the algorithm

Problems •

Problems •

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding Connectivity • Sample/Space Tradeoff for Determining if a Subspace is Full Rank

Collision Probability •

Collision Probability •

Collision Probability Algorithm • Break the t samples into t/w contiguous groups of w

Collision Probability Algorithm • Break the t samples into t/w contiguous groups of w samples 4 … 3 Group 1 • 7 … Group 2 3 1 … Group 3 1 …

Collision Probability Algorithm •

Collision Probability Algorithm •

Collision Probability Lower Bound •

Collision Probability Lower Bound •

Collision Probability Lower Bound • … …

Collision Probability Lower Bound • … …

Collision Probability Lower Bound •

Collision Probability Lower Bound •

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding Connectivity • Sample/Space Tradeoff for Determining if a Subspace is Full Rank

Graph Connectivity • Given t independent edges chosen with replacement from graph G, decide

Graph Connectivity • Given t independent edges chosen with replacement from graph G, decide if G is connected • Simulate a random walk starting at node 1 • Store current vertex • If see an edge not incident to the current vertex, discard it • Remember first node i which you haven’t seen. Finish when i > n

Graph Connectivity 2 Start at vertex 1 Current Vertex: 3 1 1 First Untouched

Graph Connectivity 2 Start at vertex 1 Current Vertex: 3 1 1 First Untouched Vertex: done 23 4 4 See IID Stream: {1, 4}, {2, 3}, {1, 4}, {3, 4}, {1, 2}, {2, 3}, {1, 2}, {3, 4} do nothing

The Loopy Graph •

The Loopy Graph •

Use More Space and Fewer Samples •

Use More Space and Fewer Samples •

Space/Time Tradeoff for Connectivity [Feige] • x x Otherwise, suppose we are in phase

Space/Time Tradeoff for Connectivity [Feige] • x x Otherwise, suppose we are in phase 2 x x Will sample a vertex from each group of k vertices

Implementation in the IID Model •

Implementation in the IID Model •

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding

Talk Outline • Sample/Space Tradeoff for Collision Probability Estimation • Sample/Space Tradeoff for Deciding Connectivity • Sample/Space Tradeoff for Determining if a Subspace is Full Rank

Determining if a Subspace Has Full Rank •

Determining if a Subspace Has Full Rank •

Statistical Query Framework •

Statistical Query Framework •

Statistical Query Framework •

Statistical Query Framework •

Conclusions • Studied space versus sample tradeoffs in the data stream model • Obtained

Conclusions • Studied space versus sample tradeoffs in the data stream model • Obtained tradeoffs for statistical, graph, and linear algebra problems • Open questions: tighten our bounds • General question: unify the techniques for the different problems