Multidimensional Search Structures CSE 6331 Leonidas Fegaras Multidimensional

  • Slides: 12
Download presentation
Multidimensional Search Structures CSE 6331 © Leonidas Fegaras Multidimensional Indexing 1

Multidimensional Search Structures CSE 6331 © Leonidas Fegaras Multidimensional Indexing 1

Spatial Databases • Spatial Objects – Points: location (x, y) – Lines: pairs of

Spatial Databases • Spatial Objects – Points: location (x, y) – Lines: pairs of points (roads, coastal lines) – Polygons: list of points (states, countries) • Data Types – Point: a data type with no extension (area) – Region: has location and boundary that defines the extension • Spatial Queries – Range queries • “Find all Italian restaurants within 20 miles from UTA” – Nearest neighbor queries • “Find the 10 Italian restaurants that are nearest to UTA” • “Find the nearest fire station to Neddermann Hall” – Spatial join queries • “Find pairs of cities within 20 miles of each other” • “Find restaurants that are adjacent to university campuses” CSE 6331 © Leonidas Fegaras Multidimensional Indexing 2

Applications • Geographical Information Systems (GIS) – Map systems – Resource management systems •

Applications • Geographical Information Systems (GIS) – Map systems – Resource management systems • Computer-aided design and manufacturing (CAD/CAM) – VLSI design (avoiding overlaps, routing wires) • Multimedia databases – Various dimensions (color, shape, texture) CSE 6331 © Leonidas Fegaras Multidimensional Indexing 3

Representation of Spatial Objects • It is very expensive to work on real boundary

Representation of Spatial Objects • It is very expensive to work on real boundary lines • Minimum Bounding Rectangle (MBR) Testing for intersection: • Test if MBRs intersect • If they do, test if boundary lines intersect CSE 6331 © Leonidas Fegaras Multidimensional Indexing 4

Grid-Tree (G-Tree) • Mainly for point data of any dimension • Based on hypercubes

Grid-Tree (G-Tree) • Mainly for point data of any dimension • Based on hypercubes E E A C B D A E B D 01 10 110 00 F A D 1 11 01 111 001 110 1111 © Leonidas Fegaras E 10 0000 CSE 6331 A 0 1110 B C 0000 0010 0001 0011 F D C B C insert point 0001 Multidimensional Indexing 0010 0011 1110 1111 5

G-Tree Organization x • Rotate vertical/horizontal split x 0 • When split, add 0/1

G-Tree Organization x • Rotate vertical/horizontal split x 0 • When split, add 0/1 at the end x 0 x 1 x 1 • G-Trees are organized in B+-trees – Key is the binary string 0 00 01 1 11 01 10 0000 0001 0010 1110 01 10 1110 1111 001 110 0001 CSE 6331 0010 0011 © Leonidas Fegaras 1110 1111 Multidimensional Indexing 6

Searching G-Trees • Search: find point P=(x 1, x 2, …, xn) – Let

Searching G-Trees • Search: find point P=(x 1, x 2, …, xn) – Let m be the number of bits of the largest bitstring – Find the m-bit region that contains P: • Start with si=0 and ti=1 for all 1<=i<=n • For k=0 to m-1: Slice over j=(k mod n)+1 dimension The kth bit of the bitstring is 0 iff xj < (sj+tj)/2; then set tj= (sj+tj)/2 Otherwise, it is 1 and set sj= (sj+tj)/2 • For n=2, m=5, and P=(0, 1, 0. 7): 0. 1 < ½ bit 0 is 0 0. 7 > ½ bit 1 is 1 0. 1 < ¼ bit 2 is 0 0. 7 < ½+¼ bit 3 is 0 0. 1 < 1/8 bit 4 is 0 Bitstring is 00010 – Use the bitstring as the key for the G-Tree CSE 6331 © Leonidas Fegaras Multidimensional Indexing 7

Point Quadtrees • Divide the space into four quadrants • Represented as an unbalanced

Point Quadtrees • Divide the space into four quadrants • Represented as an unbalanced tree where each node has 4 children – Internal node: point – Leaf: empty E B • Node: (x, y, NE, NW, SE) F A NE NW SE SW G H C D A NE NW SW SE CSE 6331 © Leonidas Fegaras Multidimensional Indexing B C E F G D H 8

Spatial Operations on Quadtrees • Search for a point P=(x, y) if current node

Spatial Operations on Quadtrees • Search for a point P=(x, y) if current node T in quadtree is leaf, then not found if T=P, then found else find quadrant of T that includes P and search recursively: e. g. , if x>T. X and y>T. Y then search SE quadrant • Given a rectangle (x 1, y 1, x 2, y 2) find all points in the rectangle – Need to keep a set of points – The rectangle may intersect more than one quadrants • Insert a point P=(x, y) If current node is leaf, then replace leaf with (x, y, nil, nil) Else determine the quadrant and apply the algorithm recursively CSE 6331 © Leonidas Fegaras Multidimensional Indexing 9

R*-Trees A A C B E D B C D E F F •

R*-Trees A A C B E D B C D E F F • It is like a B+-tree with key the MBR, but – There is no order in the rectangles in each node – Sibling rectangles may overlap • Restrictions on R*-tree nodes: – The root has at least two children (unless it is the only node) – Every non-root node has m children where M/4<=m<=M – The tree is balanced CSE 6331 © Leonidas Fegaras Multidimensional Indexing 10

Spatial Operations on R*-Trees 2 1 9 22 21 1. 3 21 22 23

Spatial Operations on R*-Trees 2 1 9 22 21 1. 3 21 22 23 5 4 6 23 26 2 24 1 22 9 2. 1 2 3 4 8 5 6 7 7 8 25 3 Insert rectangle 9: 27 5 4 6 8 CSE 6331 Find all objects that intersect rectangle 9: © Leonidas Fegaras 7 23 Multidimensional Indexing 1. 2. 26 27 24 25 22 23 3. 1 9 2 3 4 8 5 6 7 11

Insert a New Object Rectangle S • At each node (starting from root) choose

Insert a New Object Rectangle S • At each node (starting from root) choose only one bounding rectangle R to insert the rectangle S so that the insertion of S into R has the least overlap enlargement. (Overlap enlargement of a rectangle is the total overlapping area of the rectangle with the other rectangles in the tree. ) • If there are many with the same overlap enlargement, choose the one with the smallest area • If overflow, split the node into two sets of nodes using a split axis so that there is no underflow and the perimeters of the two bounding boxes are minimized. CSE 6331 © Leonidas Fegaras Multidimensional Indexing 12