Range Searching in 2 D n Main goals

  • Slides: 20
Download presentation
Range Searching in 2 D n Main goals of the lecture: n to understand

Range Searching in 2 D n Main goals of the lecture: n to understand to be able to analyze • the kd-trees and the range trees; n to see how data structures can be used to trade the space used for the running time of queries AALG, lecture 11, © Simonas Šaltenis, 2004 1

Range queries n How do you efficiently find points that are inside of a

Range queries n How do you efficiently find points that are inside of a rectangle? n n Orthogonal range query ([x 1, x 2], [y 1, y 2]): find all points (x, y) such that x 1<x<x 2 and y 1<y<y 2 Useful also as a multi-attribute database query y y 2 y 1 x 1 AALG, lecture 11, © Simonas Šaltenis, 2004 x 2

Preprocessing n n How much time such a query would take? Rules of the

Preprocessing n n How much time such a query would take? Rules of the game: n n n We preprocess the data into a data structure Then, we perform queries and updates on the data structure Analysis: • Preprocessing time • Efficiency of queries (and updates) • The size of the structure n Assumption: no two points have the same xcoordinate (the same is true for y-coordinate). AALG, lecture 11, © Simonas Šaltenis, 2004 3

1 D range query n How do we do a 1 D range query

1 D range query n How do we do a 1 D range query [x 1, x 2]? n Balanced BST where all data points are stored in the leaves • The size of it? n Where do we find the answer to a query? T Search path for x 2 Search path for x 1 … b 1 q 1 a 1 … b 2 q 2 a 2 … AALG, lecture 11, © Simonas Šaltenis, 2004 Total order of data points 4

1 D range query n How do we find all these leaf nodes? n

1 D range query n How do we find all these leaf nodes? n A possibility: have a linked list of leaves and traverse from q 1 to q 2 • but, will not work for more dimensions… n Sketch of the algorithm: • • n Find the split node Continue searching for x 1, report all right-subtrees Continue searching for x 2, report all left-subtrees When leaves q 1 and q 2 are reached, check if they belong to the range Why is this correct? AALG, lecture 11, © Simonas Šaltenis, 2004 5

Analysis of 1 D range query n What is the worst-case running time of

Analysis of 1 D range query n What is the worst-case running time of a query? n n What is the time of construction? n n n It is output-sensitive: two traversals down the tree plus the O(k), where k is the number of reported data points: O(log n + k) Sort, construct by dividing into two, creating the root and conquering the two parts recursively O(n log n) Size: O(n) AALG, lecture 11, © Simonas Šaltenis, 2004 6

2 D range query n How can we solve a 2 D range query?

2 D range query n How can we solve a 2 D range query? n n Observation – 2 D range query is a conjunction of two 1 D range queries: x 1<x<x 2 and y 1<y<y 2 Naïve idea: • have two BSTs (on x-coordinate and on y-coordinate) • Ask two 1 D range queries • Return the intersection of their results n What is the worst-case running time (and when does it happen)? Is it output-sensitive? y y 2 y 1 x 1 AALG, lecture 11, © Simonas Šaltenis, 2004 x 2 x 7

Range tree n Idea: when performing search on x-coordinate, we need to start filtering

Range tree n Idea: when performing search on x-coordinate, we need to start filtering points on y-coordinate earlier! n n Canonical subset P(v) of a node v in a BST is a set of points (leaves) stored in a subtree rooted at v Range tree is a multi-level data BST on y-coords structure: T a ( v) • The main tree is a BST T on the x T -coordinate of points • Any node v of T stores a pointer to a BST Ta(v) (associated structure of v), which stores canonical subset P(v) organized on the y-coordinate • 2 D points are stored in all leaves! AALG, lecture 11, © Simonas Šaltenis, 2004 v P ( v) BST on x-coords 8

Querying the range tree n How do we query such a tree? n n

Querying the range tree n How do we query such a tree? n n Use the 1 DRange. Search on T, but replace Report. Subtree(w) with 1 DRange. Search(Ta(w), y 1, y 2) What is the worst-case running time? n n Worst-case: We query the associated structures on all nodes on the path down the tree On level j, the depth of the associated structure is n Total running time: O(log 2 n + k) AALG, lecture 11, © Simonas Šaltenis, 2004 9

Size of the range tree n What is the size of the range tree?

Size of the range tree n What is the size of the range tree? n n n At each level of the main tree associated structures store all the data points once (with constant overhead) (Why? ) : O(n) There are O(log n) levels Thus, the total size is O(n log n) AALG, lecture 11, © Simonas Šaltenis, 2004 10

Building the range tree n How do we efficiently build the range tree? n

Building the range tree n How do we efficiently build the range tree? n n n Sort the points on x and on y (two arrays: X, Y) Take the median v of X and create a root, build its associated structure using Y Split X into sorted XL and XR, split Y into sorted YL and YR (s. t. for any pÎXL or pÎYL, p. x < v. x and for any pÎXR or pÎYR, p. x ³ v. x) Build recursively the left child from XL and YL and the right child from XR and YR What is the running time of this? n O(n log n) AALG, lecture 11, © Simonas Šaltenis, 2004 11

Range trees: summary n Range trees n n Building (preprocessing time): O(n log n)

Range trees: summary n Range trees n n Building (preprocessing time): O(n log n) Size: O(n log n) Range queries: O(log 2 n + k) Running time can be improved to O(log n + k) without sacrificing the preprocessing time or size n n Layered range trees (uses fractional cascading) Priority range trees (uses priority search trees as associated structures) AALG, lecture 11, © Simonas Šaltenis, 2004 12

Kd-trees n What if we want linear space? n n Idea: partition trees –

Kd-trees n What if we want linear space? n n Idea: partition trees – generalization of binary search trees Kd-tree: a binary tree • Data points are at leaves • For each internal node v: • x-coords of left subtree £ v < x-coords of right subtree, if depth of v is even (split with vertical line) • y-coords of left subtree £ v < y-coords of right subtree, if depth of v is odd (split with horizontal line) n Space: O(n) – points are stored once. AALG, lecture 11, © Simonas Šaltenis, 2004 13

Example kd-tree 8 g 7 a 5 6 b 5 4 2 4 e

Example kd-tree 8 g 7 a 5 6 b 5 4 2 4 e 3 3 x 3 y 6 x f d 2 d c e b a c g f 1 1 2 3 4 5 6 7 8 AALG, lecture 11, © Simonas Šaltenis, 2004 14

Draw a kd-tree n Draw a kd-tree storing the following data points 8 b

Draw a kd-tree n Draw a kd-tree storing the following data points 8 b h 7 g 6 e 5 d 4 f 3 2 a c 1 1 2 3 4 5 6 7 AALG, lecture 11, © Simonas Šaltenis, 2004 8 15

Querying the kd-tree n How do we answer a range query? n n Observation:

Querying the kd-tree n How do we answer a range query? n n Observation: Each internal node v corresponds to a region(v) (where all its children are included). We can maintain region(v) as we traverse down the tree 8 g 7 a 6 5 4 b 5 4 3 e d d c 1 2 3 4 2 3 6 e b a c f 2 1 3 5 6 AALG, lecture 11, © Simonas Šaltenis, 2004 7 g f 8 16

Querying the kd-tree n The range query algorithm (query range R): n n n

Querying the kd-tree n The range query algorithm (query range R): n n n If region(v) does not intersect R, do not go deeper into the subtree rooted at v If region(v) is fully contained in R, report all points in the subtree rooted at v If region(v) only intersects with R, go recursively into v’s children. AALG, lecture 11, © Simonas Šaltenis, 2004 17

Analysis of the search alg. n What is the worst-case running time of the

Analysis of the search alg. n What is the worst-case running time of the search? n n Traversal of subtrees v, such that region(v) is fully contained in R adds up to O(k). We need to find the number of regions that intersect R – the regions which are crossed by some border of R • As an upper bound for that, let’s find how many regions a crossed by a vertical (or horizontal) line • What recurrence can we write for it? n. Solution: Total time: AALG, lecture 11, © Simonas Šaltenis, 2004 18

Building the kd-tree n How do we build the kd-tree? n n n Sort

Building the kd-tree n How do we build the kd-tree? n n n Sort the points on x and on y (two arrays: X, Y) Take the median v of X (if depth is even) or Y (if depth is odd) and create a root Split X into sorted XL and XR, split Y into sorted YL and YR, s. t. • for any pÎXL or pÎYL, p. x < v. x (if depth is even) or p. y < v. y (if depth is odd) • for any pÎXR or pÎYR, p. x ³ v. x (if depth is even) or p. y ³ v. y (if depth is odd) n n Build recursively the left child from XL and YL and the right child from XR and YR What is the running time of this? n O(n log n) AALG, lecture 11, © Simonas Šaltenis, 2004 19

Kd-trees: summary n Kd-tree: n n n Building (preprocessing time): O(n log n) Size:

Kd-trees: summary n Kd-tree: n n n Building (preprocessing time): O(n log n) Size: O(n) Range queries: AALG, lecture 11, © Simonas Šaltenis, 2004 20