2015 Data Compression Conference Salt Lake City United

  • Slides: 34
Download presentation
2015 Data Compression Conference Salt Lake City, United States, 7 -9 April 2015 FASTER

2015 Data Compression Conference Salt Lake City, United States, 7 -9 April 2015 FASTER COMPRESSED QUADTREES Travis Gagie Javier González Susana Ladra Gonzalo Navarro Diego Seco

2 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

2 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

3 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

3 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

4 / 34 Introduction • Storing and querying two-dimensional points sets is important •

4 / 34 Introduction • Storing and querying two-dimensional points sets is important • Computational geometry • Geographic information systems • Graphics • Many other fields

5 / 34 Introduction • Real world point sets tend to be clustered •

5 / 34 Introduction • Real world point sets tend to be clustered • Using a machine word for each point is wasteful • Compressing data allows for faster operations because of caching • We are looking for a structure able to take advantage of the clustering property to improve space and time performance

6 / 34 Introduction • Example with k = 2 0 1 0 0

6 / 34 Introduction • Example with k = 2 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T = 101111010100100011001000000101011110 L = 01000011001010101000011000100100

7 / 34 Introduction • Retrieving neighbors lists: • Top-down traversal of the tree

7 / 34 Introduction • Retrieving neighbors lists: • Top-down traversal of the tree • Example: retrieve the direct neighbors of page 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 1 0 0 0 0 0 8 0 0 0 1 0 0 0 0 9 0 0 0 1 0 1 0 0 0 10 0 0 0 11 0 0 0 0 12 0 0 0 0 13 0 0 0 0 14 0 0 0 0 15 0 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30… T = 101111010100100011001000000101011110 children(9) = rank children(2) = rank 1(T, 9)*k (T, 2)* k 22 = 7*4=28 = 2*4=8 L = 01000011001010101000011000100100

8 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

8 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

9 / 34 Compressed quadtree representation • Suppose we have a 4 x 4

9 / 34 Compressed quadtree representation • Suppose we have a 4 x 4 matrix where each cell contains 0 or 1 • Only 1’s are relevant • Storing only cells with value 1 is necessary • (0, 0) • (3, 0) • (1, 2) • (2, 2) • (1, 3) How can we store them to take advantage of the clustering property?

10 / 34 Compressed quadtree representation • Quadcode: Binary representation of a point in

10 / 34 Compressed quadtree representation • Quadcode: Binary representation of a point in a quadtree Path to a leaf Matrix Quadtree • Closed points share a common prefix (related with quadrants) • We will use this property for compression

11 / 34 Compressed quadtree representation Same than using binary representation and interleaving bits:

11 / 34 Compressed quadtree representation Same than using binary representation and interleaving bits: 1001 1011 We can compute each quadcode in O(1) with precomputed tables

12 / 34 Compressed quadtree representation • Storing points in a trie

12 / 34 Compressed quadtree representation • Storing points in a trie

13 / 34 Compressed quadtree representation •

13 / 34 Compressed quadtree representation •

14 / 34 Compressed quadtree representation • Heavy path decomposition • Technique for decomposing

14 / 34 Compressed quadtree representation • Heavy path decomposition • Technique for decomposing a rooted tree into a set of heavy paths • Each non-leaf node selects one heavy edge • A heavy edge is the edge of a subtree with the most number of leaves • Selected edges form a heavy path

15 / 34 Compressed quadtree representation • In each step, one edge is selected

15 / 34 Compressed quadtree representation • In each step, one edge is selected and the other one is stored in a queue • After finish building a heavy path, a node is retrieved from the queue and the process start again

16 / 34 Compressed quadtree representation

16 / 34 Compressed quadtree representation

17 / 34 Compressed quadtree representation • Each Next bitmap supports rank operations

17 / 34 Compressed quadtree representation • Each Next bitmap supports rank operations

18 / 34 Compressed quadtree representation •

18 / 34 Compressed quadtree representation •

19 / 34 Compressed quadtree representation • Remaining quadcode: 100 Remaining path: 0100000

19 / 34 Compressed quadtree representation • Remaining quadcode: 100 Remaining path: 0100000

20 / 34 Compressed quadtree representation • Remaining quadcode: 00 Remaining path: 00

20 / 34 Compressed quadtree representation • Remaining quadcode: 00 Remaining path: 00

21 / 34 Compressed quadtree representation •

21 / 34 Compressed quadtree representation •

22 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

22 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

23 / 34 Experimental evaluation • We use grid datasets from different domains: •

23 / 34 Experimental evaluation • We use grid datasets from different domains: • Geographic information systems (GIS) • Social networks (SN) • Web graphs (WEB) • RDF (RDF) • Space is measured in bits per point (bpp)

24 / 34 Experimental evaluation •

24 / 34 Experimental evaluation •

25 / 34 Experimental evaluation • Types of membership • Empty cells • Filled

25 / 34 Experimental evaluation • Types of membership • Empty cells • Filled cells • Isolated filled cells • Measure: Average time per query in nanoseconds

26 / 34 Experimental evaluation

26 / 34 Experimental evaluation

27 / 34 Experimental evaluation Left point – Dense Middle point – Medium Right

27 / 34 Experimental evaluation Left point – Dense Middle point – Medium Right point – Sparse

28 / 34 Experimental evaluation Left point - dblp 2011 Right point – enwiki

28 / 34 Experimental evaluation Left point - dblp 2011 Right point – enwiki 2013

29 / 34 Experimental evaluation

29 / 34 Experimental evaluation

30 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

30 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

31 / 34 Conclusions • Nice theorical bounds and it is practical • Space

31 / 34 Conclusions • Nice theorical bounds and it is practical • Space requirements are similar to other space- efficient representation of quadtrees • Faster handling isolated filled cells

32 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4.

32 / 34 Outline 1. Introduction 2. Compressed quadtree representation 3. Experimental evaluation 4. Conclusions 5. Future work

33 / 34 Future work • Generalization to higher dimensions • Experimental results for

33 / 34 Future work • Generalization to higher dimensions • Experimental results for range reporting (querying for points in a particular region)

2015 Data Compression Conference Salt Lake City, United States, 7 -9 April 2015 FASTER

2015 Data Compression Conference Salt Lake City, United States, 7 -9 April 2015 FASTER COMPRESSED QUADTREES Travis Gagie Javier González Susana Landra Gonzalo Navarro Diego Seco If you have any question: javigonzalez@udec. cl