SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha

  • Slides: 24
Download presentation
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN 1

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN 1

Synopses Given n input numbers, summarize the input using B numbers, minimizing some error.

Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. ¡ Examples ¡ l l l Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you… Space efficiency in synopsis construction algorithms VLDB 2005 2

Why space efficiency ¡ ¡ “Interestingly, according to modern astronomers, space is finite. This

Why space efficiency ¡ ¡ “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things. ” Woody Allen. From a computational viewpoint however… Space efficiency in synopsis construction algorithms VLDB 2005 3

Space is the cruelest resource ¡ Resources l l ¡ ¡ Time : tweedle

Space is the cruelest resource ¡ Resources l l ¡ ¡ Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsis Space efficiency in synopsis construction algorithms VLDB 2005 4

Examples - I ¡ Histograms l l l Many error measures V-OPT, Jagadish etal,

Examples - I ¡ Histograms l l l Many error measures V-OPT, Jagadish etal, 1998 O(n 2 B) time O(n. B) space Only O(n) space at a time (working space) ¡ O(n 2 B 2) time and O(n) space ¡ l l Is that the best ? Here: O(n 2 B) time O(n) space. Space efficiency in synopsis construction algorithms VLDB 2005 5

Example - II ¡ (Haar) Wavelets l Orthonormal systems l For l 2 error

Example - II ¡ (Haar) Wavelets l Orthonormal systems l For l 2 error store the largest B coeffs of input l Does not work for non l 2 l Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n 2 B log B) time O(n 2 B) space, but O(n. B) needed at a time (for l 1 ) l Here O(n) space, and O(n 2) time l Space efficiency in synopsis construction algorithms VLDB 2005 6

Example - III ¡ Extended Wavelets l l l Multiple measures Optimization is similar

Example - III ¡ Extended Wavelets l l l Multiple measures Optimization is similar to Knapsack with choices. Previous best – ¡ ¡ ¡ Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(Mn. B), but needing O(n. M+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {n. M, B 2}) Here, O(BM) space Space efficiency in synopsis construction algorithms VLDB 2005 7

What we will not talk about ¡ ¡ Approximation algorithms for histograms Range Query

What we will not talk about ¡ ¡ Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is large Space efficiency in synopsis construction algorithms VLDB 2005 8

The main idea ¡ ¡ Can we solve using a non DP paradigm ?

The main idea ¡ ¡ Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction l l l Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ? Space efficiency in synopsis construction algorithms VLDB 2005 9

A case study - Histograms ¡ ¡ Formally, given a signal X find a

A case study - Histograms ¡ ¡ Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP … Space efficiency in synopsis construction algorithms VLDB 2005 10

The DP for histograms Err[i, b] = Error of approximating x 1, …, xi

The DP for histograms Err[i, b] = Error of approximating x 1, …, xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i, b] = min Err[i, b], Err[j, b-1] + error(j+1, i) B n Space efficiency in synopsis construction algorithms VLDB 2005 11

What if ¡ ¡ We could figure out what was the story at the

What if ¡ ¡ We could figure out what was the story at the middlepoint ! Two questions l l So what ? How ? (use a DP) Space efficiency in synopsis construction algorithms VLDB 2005 12

Wait a minute … ¡ We just replaced a DP by another and claimed

Wait a minute … ¡ We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? Space efficiency in synopsis construction algorithms VLDB 2005 13

The code Space efficiency in synopsis construction algorithms VLDB 2005 14

The code Space efficiency in synopsis construction algorithms VLDB 2005 14

The end of working space ¡ ¡ If you can partition a problem using

The end of working space ¡ ¡ If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space. Space efficiency in synopsis construction algorithms VLDB 2005 15

How much is little ? Space efficiency in synopsis construction algorithms VLDB 2005 16

How much is little ? Space efficiency in synopsis construction algorithms VLDB 2005 16

Wavelets ¡ A set of vectors l l l {1, -1, 0, 0…}, {0,

Wavelets ¡ A set of vectors l l l {1, -1, 0, 0…}, {0, 0, 1, -1, 0, 0, …}, {0, 0, 1, -1, 0, 0}, {0, 0, 0, 1 -1} {1, 1, -1, 0, 0}, {0, 0, 1, 1, -1} {1, 1, -1, -1, -1}, {1, 1, 1} A natural multi-resolution Space efficiency in synopsis construction algorithms VLDB 2005 17

Wavelet Synopsis Construction ¡ Formally, given a signal X and the Haar basis {

Wavelet Synopsis Construction ¡ Formally, given a signal X and the Haar basis { i} find a representation F= i zi i with at most B non-zero zi minimizing some error which a fn of X-F ¡ Restriction. Zi is either 0 or h X, i i ¡ Debate. Unrestricted or restricted. Omit. Space efficiency in synopsis construction algorithms VLDB 2005 18

Wavelets ¡ ¡ ¡ ||X-F||1 Long history Matias, Vitter Wang ’ 98 Garofalakis, Gibbons,

Wavelets ¡ ¡ ¡ ||X-F||1 Long history Matias, Vitter Wang ’ 98 Garofalakis, Gibbons, ’ 02 Garofalakis, Kumar, ’ 04 State of the Art l l l O(n 2 B log B) time O(n 2 B) space O(n. B) working space ¡ Here O(n 2 log B) time O(n) space ¡ SEE ALSO NEXT TALK … Space efficiency in synopsis construction algorithms VLDB 2005 19

What happens to wavelets [GK 04] ? Space efficiency in synopsis construction algorithms VLDB

What happens to wavelets [GK 04] ? Space efficiency in synopsis construction algorithms VLDB 2005 20

Extensions Approximation Algorithms ¡ Range Query Histograms ¡ Extended Wavelets ¡ Space efficiency in

Extensions Approximation Algorithms ¡ Range Query Histograms ¡ Extended Wavelets ¡ Space efficiency in synopsis construction algorithms VLDB 2005 21

Histograms ¡ Saves space across all algorithms except algorithms which extend to general error

Histograms ¡ Saves space across all algorithms except algorithms which extend to general error measure over streams Space efficiency in synopsis construction algorithms VLDB 2005 22

Range Query Same story ¡ Open Q: ¡ l faster algorithm obeying synopsis size

Range Query Same story ¡ Open Q: ¡ l faster algorithm obeying synopsis size Space efficiency in synopsis construction algorithms VLDB 2005 23

That’s all folks Space efficiency in synopsis construction algorithms VLDB 2005 24

That’s all folks Space efficiency in synopsis construction algorithms VLDB 2005 24