Carnegie Mellon Univ Dept of Computer Science 15

  • Slides: 105
Download presentation
Carnegie Mellon Univ. Dept. of Computer Science 15 -415 - Database Applications C. Faloutsos

Carnegie Mellon Univ. Dept. of Computer Science 15 -415 - Database Applications C. Faloutsos Spatial Access Methods - z-ordering Carnegie Mellon 15 -415 - C. Faloutsos

General Overview • Relational model – SQL; db design • Indexing; Q-opt; Transaction processing

General Overview • Relational model – SQL; db design • Indexing; Q-opt; Transaction processing • Advanced topics – Distributed Databases – RAID – Authorization / Stat. DB – Spatial Access Methods (SAMs) – Multimedia Indexing Carnegie Mellon 15 -415 - C. Faloutsos 2

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering –

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 3

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer spatial queries (like? ? ) Carnegie Mellon 15 -415 - C. Faloutsos 4

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer – point queries – range queries – k-nn queries – spatial joins (‘all pairs’ queries) Carnegie Mellon 15 -415 - C. Faloutsos 5

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer – point queries – range queries – k-nn queries – spatial joins (‘all pairs’ queries) Carnegie Mellon 15 -415 - C. Faloutsos 6

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer – point queries – range queries – k-nn queries – spatial joins (‘all pairs’ queries) Carnegie Mellon 15 -415 - C. Faloutsos 7

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer – point queries – range queries – k-nn queries – spatial joins (‘all pairs’ queries) Carnegie Mellon 15 -415 - C. Faloutsos 8

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines,

Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, . . . ) • organize them on disk, to answer – point queries – range queries – k-nn queries – spatial joins (‘all pairs’ within ε) Carnegie Mellon 15 -415 - C. Faloutsos 9

SAMs - motivation • Q: applications? Carnegie Mellon 15 -415 - C. Faloutsos 10

SAMs - motivation • Q: applications? Carnegie Mellon 15 -415 - C. Faloutsos 10

SAMs - motivation traditional DB GIS age salary Carnegie Mellon 15 -415 - C.

SAMs - motivation traditional DB GIS age salary Carnegie Mellon 15 -415 - C. Faloutsos 11

SAMs - motivation traditional DB GIS age salary Carnegie Mellon 15 -415 - C.

SAMs - motivation traditional DB GIS age salary Carnegie Mellon 15 -415 - C. Faloutsos 12

SAMs - motivation CAD/CAM find elements too close to each other Carnegie Mellon 15

SAMs - motivation CAD/CAM find elements too close to each other Carnegie Mellon 15 -415 - C. Faloutsos 13

SAMs - motivation CAD/CAM Carnegie Mellon 15 -415 - C. Faloutsos 14

SAMs - motivation CAD/CAM Carnegie Mellon 15 -415 - C. Faloutsos 14

SAMs - motivation eg, . std S 1 F(S 1) 1 365 day F(Sn)

SAMs - motivation eg, . std S 1 F(S 1) 1 365 day F(Sn) Sn eg, avg 1 Carnegie Mellon 365 day 15 -415 - C. Faloutsos 15

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering –

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 16

SAMs: solutions • z-ordering • R-trees • (grid files) Q: how would you organize,

SAMs: solutions • z-ordering • R-trees • (grid files) Q: how would you organize, e. g. , n-dim points, on disk? (C points per disk page) Carnegie Mellon 15 -415 - C. Faloutsos 17

z-ordering Q: how would you organize, e. g. , n-dim points, on disk? (C

z-ordering Q: how would you organize, e. g. , n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1 -d points(!!) Q 1: why? A: Q 2: how? Carnegie Mellon 15 -415 - C. Faloutsos 18

z-ordering Q: how would you organize, e. g. , n-dim points, on disk? (C

z-ordering Q: how would you organize, e. g. , n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1 -d points (!!) Q 1: why? A: B-trees! Q 2: how? Carnegie Mellon 15 -415 - C. Faloutsos 19

z-ordering Q 2: how? A: assume finite granularity; z-ordering = bitshuffling = N-trees =

z-ordering Q 2: how? A: assume finite granularity; z-ordering = bitshuffling = N-trees = Morton keys = geocoding =. . . Carnegie Mellon 15 -415 - C. Faloutsos 20

z-ordering Q 2: how? A: assume finite granularity (e. g. , 232 x 232

z-ordering Q 2: how? A: assume finite granularity (e. g. , 232 x 232 ; 4 x 4 here) Q 2. 1: how to map n-d cells to 1 -d cells? Carnegie Mellon 15 -415 - C. Faloutsos 21

z-ordering Q 2. 1: how to map n-d cells to 1 -d cells? Carnegie

z-ordering Q 2. 1: how to map n-d cells to 1 -d cells? Carnegie Mellon 15 -415 - C. Faloutsos 22

z-ordering Q 2. 1: how to map n-d cells to 1 -d cells? A:

z-ordering Q 2. 1: how to map n-d cells to 1 -d cells? A: row-wise Q: is it good? Carnegie Mellon 15 -415 - C. Faloutsos 23

z-ordering Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis

z-ordering Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis Carnegie Mellon 15 -415 - C. Faloutsos 24

z-ordering Q: How about the ‘snake’ curve? Carnegie Mellon 15 -415 - C. Faloutsos

z-ordering Q: How about the ‘snake’ curve? Carnegie Mellon 15 -415 - C. Faloutsos 25

z-ordering Q: How about the ‘snake’ curve? A: still problems: 2^32 Carnegie Mellon 15

z-ordering Q: How about the ‘snake’ curve? A: still problems: 2^32 Carnegie Mellon 15 -415 - C. Faloutsos 26

z-ordering Q: Why are those curves ‘bad’? A: no distance preservation (~ clustering) Q:

z-ordering Q: Why are those curves ‘bad’? A: no distance preservation (~ clustering) Q: solution? 2^32 Carnegie Mellon 15 -415 - C. Faloutsos 27

z-ordering Q: solution? (w/ good clustering, and easy to compute, for 2 -d and

z-ordering Q: solution? (w/ good clustering, and easy to compute, for 2 -d and n-d? ) Carnegie Mellon 15 -415 - C. Faloutsos 28

z-ordering Q: solution? (w/ good clustering, and easy to compute, for 2 -d and

z-ordering Q: solution? (w/ good clustering, and easy to compute, for 2 -d and n-d? ) A: z-ordering/bit-shuffling/linear-quadtrees ‘looks’ better: • few long jumps; • scoops out the whole quadrant before leaving it • a. k. a. space filling curves Carnegie Mellon 15 -415 - C. Faloutsos 29

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3 (equivalent) answers! Carnegie Mellon 15 -415 - C. Faloutsos 30

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y))? A 1: ‘z’

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y))? A 1: ‘z’ (or ‘N’) shapes, RECURSIVELY order-1 order-2 Carnegie Mellon 15 -415 - C. Faloutsos . . . order (n+1) 31

z-ordering Notice: • self similar (we’ll see about fractals, soon) • method is hard

z-ordering Notice: • self similar (we’ll see about fractals, soon) • method is hard to use: z =? f(x, y) order-1 order-2 Carnegie Mellon 15 -415 - C. Faloutsos . . . order (n+1) 32

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3 (equivalent) answers! Method #2? Carnegie Mellon 15 -415 - C. Faloutsos 33

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0 1 )2 = 5 00 Carnegie Mellon 01 10 11 x 15 -415 - C. Faloutsos 34

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0 1 )2 = 5 How about the reverse: 00 Carnegie Mellon 01 10 11 x (x, y) = g(z) ? 15 -415 - C. Faloutsos 35

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0

z-ordering bit-shuffling x 00 y 11 10 01 00 y 11 z =( 0 1 )2 = 5 How about n-d spaces? 00 Carnegie Mellon 01 10 11 x 15 -415 - C. Faloutsos 36

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3

z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x, y) )? A: 3 (equivalent) answers! Method #3? Carnegie Mellon 15 -415 - C. Faloutsos 37

z-ordering linear-quadtrees : assign N->1, S->0 e. t. c. W E 1 0 0

z-ordering linear-quadtrees : assign N->1, S->0 e. t. c. W E 1 0 0 Carnegie Mellon N 01. . . 11. . . S 00. . . 10. . . 1 15 -415 - C. Faloutsos 38

z-ordering. . . and repeat recursively. Eg. : zblue-cell = WN; WN = (0101)2

z-ordering. . . and repeat recursively. Eg. : zblue-cell = WN; WN = (0101)2 = 5 W E 11 00 1 0 0 Carnegie Mellon N 01. . . 11. . . S 00. . . 10. . . 1 15 -415 - C. Faloutsos 39

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N 0 S 0 Carnegie Mellon 1 15 -415 - C. Faloutsos 40

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N 0 method#1: 14 method#2: shuffle(11; 10)= (1110)2 = 14 S 0 Carnegie Mellon 1 15 -415 - C. Faloutsos 41

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N

z-ordering Drill: z-value of magenta cell, with the three methods? W E 1 N 0 S 0 Carnegie Mellon method#1: 14 method#2: shuffle(11; 10)= (1110)2 = 14 method#3: EN; ES =. . . = 14 1 15 -415 - C. Faloutsos 42

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea - 3 methods use w/ B-trees; algorithms (range, knn queries. . . ) non-point (eg. , region) data analysis; variations – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 43

z-ordering - usage & algo’s Q 1: How to store on disk? A: Q

z-ordering - usage & algo’s Q 1: How to store on disk? A: Q 2: How to answer range queries etc Carnegie Mellon 15 -415 - C. Faloutsos 44

z-ordering - usage & algo’s Q 1: How to store on disk? A: treat

z-ordering - usage & algo’s Q 1: How to store on disk? A: treat z-value as primary key; feed to B-tree PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 45

z-ordering - usage & algo’s MAJOR ADVANTAGES w/ B-tree: • already inside commercial systems

z-ordering - usage & algo’s MAJOR ADVANTAGES w/ B-tree: • already inside commercial systems (no coding/debugging!) • concurrency & recovery is ready PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 46

z-ordering - usage & algo’s Q 2: queries? (eg. : find city at (0,

z-ordering - usage & algo’s Q 2: queries? (eg. : find city at (0, 3) )? PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 47

z-ordering - usage & algo’s Q 2: queries? (eg. : find city at (0,

z-ordering - usage & algo’s Q 2: queries? (eg. : find city at (0, 3) )? A: find z-value; search B-tree PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 48

z-ordering - usage & algo’s Q 2: range queries? PGH SF Carnegie Mellon 15

z-ordering - usage & algo’s Q 2: range queries? PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 49

z-ordering - usage & algo’s Q 2: range queries? A: compute ranges of z-values;

z-ordering - usage & algo’s Q 2: range queries? A: compute ranges of z-values; use B-tree PGH SF Carnegie Mellon 9, 11 -15 15 -415 - C. Faloutsos 50

z-ordering - usage & algo’s Q 2’: range queries - how to reduce #

z-ordering - usage & algo’s Q 2’: range queries - how to reduce # of qualifying of ranges? PGH SF Carnegie Mellon 9, 11 -15 15 -415 - C. Faloutsos 51

z-ordering - usage & algo’s Q 2’: range queries - how to reduce #

z-ordering - usage & algo’s Q 2’: range queries - how to reduce # of qualifying of ranges? A: Augment the query! PGH SF Carnegie Mellon 9, 11 -15 -> 8 -15 15 -415 - C. Faloutsos 52

z-ordering - usage & algo’s Q 2’’: range queries - how to break a

z-ordering - usage & algo’s Q 2’’: range queries - how to break a query into ranges? 9, 11 -15 Carnegie Mellon 15 -415 - C. Faloutsos 53

z-ordering - usage & algo’s Q 2’’: range queries - how to break a

z-ordering - usage & algo’s Q 2’’: range queries - how to break a query into ranges? A: recursively, quadtree-style; decompose only non-full quadrants 12 -15 Carnegie Mellon 15 -415 - C. Faloutsos 9, 11 -15 54

z-ordering - usage & algo’s Q 2’’: range queries - how to break a

z-ordering - usage & algo’s Q 2’’: range queries - how to break a query into ranges? A: recursively, quadtree-style; decompose only non-full quadrants 12 -15 9, 11 Carnegie Mellon 15 -415 - C. Faloutsos 55

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea - 3 methods use w/ B-trees; algorithms (range, knn queries. . . ) non-point (eg. , region) data analysis; variations – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 56

z-ordering - usage & algo’s Q 3: k-nn queries? (say, 1 -nn)? PGH SF

z-ordering - usage & algo’s Q 3: k-nn queries? (say, 1 -nn)? PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 57

z-ordering - usage & algo’s Q 3: k-nn queries? (say, 1 -nn)? A: traverse

z-ordering - usage & algo’s Q 3: k-nn queries? (say, 1 -nn)? A: traverse B-tree; find nn wrt z-values and. . . PGH SF Carnegie Mellon 15 -415 - C. Faloutsos 58

z-ordering - usage & algo’s. . . ask a range query. PGH SF nn

z-ordering - usage & algo’s. . . ask a range query. PGH SF nn wrt z-value 3 Carnegie Mellon 15 -415 - C. Faloutsos 5 12 59

z-ordering - usage & algo’s. . . ask a range query. PGH SF nn

z-ordering - usage & algo’s. . . ask a range query. PGH SF nn wrt z-value 3 Carnegie Mellon 15 -415 - C. Faloutsos 5 12 60

z-ordering - usage & algo’s Q 4: all-pairs queries? ( all pairs of cities

z-ordering - usage & algo’s Q 4: all-pairs queries? ( all pairs of cities within 10 miles from each other? ) PGH SF (we’ll see ‘spatial joins’ later: find all PA counties that intersect a lake) Carnegie Mellon 15 -415 - C. Faloutsos 61

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea - 3 methods use w/ B-trees; algorithms (range, knn queries. . . ) non-point (eg. , region) data analysis; variations – R-trees –. . . Carnegie Mellon 15 -415 - C. Faloutsos 62

z-ordering - regions Q: z-value for a region? z. B = ? ? B

z-ordering - regions Q: z-value for a region? z. B = ? ? B A z. C = ? ? C Carnegie Mellon 15 -415 - C. Faloutsos 63

z-ordering - regions Q: z-value for a region? A: 1 or more z-values; by

z-ordering - regions Q: z-value for a region? A: 1 or more z-values; by quadtree decomposition B A z. B = ? ? z. C = ? ? C Carnegie Mellon 15 -415 - C. Faloutsos 64

z-ordering - regions Q: z-value for a region? W E A 1 0 0

z-ordering - regions Q: z-value for a region? W E A 1 0 0 Carnegie Mellon “don’t care” z. B = 11** z. C = ? ? B 11 00 N 01. . . 11. . . S 00. . . 10. . . C 1 15 -415 - C. Faloutsos 65

z-ordering - regions Q: z-value for a region? W E A 1 0 0

z-ordering - regions Q: z-value for a region? W E A 1 0 0 Carnegie Mellon “don’t care” z. B = 11** z. C = {0010; 1000} B 11 00 N 01. . . 11. . . S 00. . . 10. . . C 1 15 -415 - C. Faloutsos 66

z-ordering - regions Q: How to store in B-tree? Q: How to search (range

z-ordering - regions Q: How to store in B-tree? Q: How to search (range etc queries) B A C Carnegie Mellon 15 -415 - C. Faloutsos 67

z-ordering - regions Q: How to store in B-tree? A: sort (*<0<1) Q: How

z-ordering - regions Q: How to store in B-tree? A: sort (*<0<1) Q: How to search (range etc queries) B A C Carnegie Mellon 15 -415 - C. Faloutsos 68

z-ordering - regions Q: How to search (range etc queries) - eg ‘red’ range

z-ordering - regions Q: How to search (range etc queries) - eg ‘red’ range query B A C Carnegie Mellon 15 -415 - C. Faloutsos 69

z-ordering - regions Q: How to search (range etc queries) - eg ‘red’ range

z-ordering - regions Q: How to search (range etc queries) - eg ‘red’ range query A: break query in z-values; check B-tree B A C Carnegie Mellon 15 -415 - C. Faloutsos 70

z-ordering - regions Almost identical to range queries for point data, except for the

z-ordering - regions Almost identical to range queries for point data, except for the “don’t cares” - i. e. , B A 1100 ? ? 11** C Carnegie Mellon 15 -415 - C. Faloutsos 71

z-ordering - regions Almost identical to range queries for point data, except for the

z-ordering - regions Almost identical to range queries for point data, except for the “don’t cares” - i. e. , z 1= 1100 ? ? 11** = z 2 Specifically: does z 1 contain/avoid/intersect z 2? Q: what is the criterion to decide? Carnegie Mellon 15 -415 - C. Faloutsos 72

z-ordering - regions z 1= 1100 ? ? 11** = z 2 Specifically: does

z-ordering - regions z 1= 1100 ? ? 11** = z 2 Specifically: does z 1 contain/avoid/intersect z 2? Q: what is the criterion to decide? A: Prefix property: let r 1, r 2 be the corresponding regions, and let r 1 be the smallest (=> z 1 has fewest ‘*’s). Then: Carnegie Mellon 15 -415 - C. Faloutsos 73

z-ordering - regions • r 2 will either contain completely, or avoid completely r

z-ordering - regions • r 2 will either contain completely, or avoid completely r 1. • it will contain r 1, if z 2 is the prefix of z 1 B A 1100 ? ? 11** region of z 1: completely contained in region of z 2 Carnegie Mellon C 15 -415 - C. Faloutsos 74

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01******

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01****** • z 3= 0100**** T/F r 2 contains r 1 T/F r 3 contains r 2 Carnegie Mellon 15 -415 - C. Faloutsos 75

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01******

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01****** • z 3= 0100**** T/F r 2 contains r 1 - TRUE (prefix property) T/F r 3 contains r 1 - FALSE (disjoint) T/F r 3 contains r 2 - FALSE (r 2 contains r 3) Carnegie Mellon 15 -415 - C. Faloutsos 76

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01******

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01****** • z 3= 0100**** Carnegie Mellon 15 -415 - C. Faloutsos z 2 77

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01******

z-ordering - regions Drill (True/False). Given: • z 1= 011001** • z 2= 01****** • z 3= 0100**** z 2 z 3 T/F r 2 contains r 1 - TRUE (prefix property) T/F r 3 contains r 1 - FALSE (disjoint) T/F r 3 contains r 2 - FALSE (r 2 contains r 3) Carnegie Mellon 15 -415 - C. Faloutsos 78

z-ordering - regions Spatial joins: find (quickly) all counties intersecting Carnegie Mellon 15 -415

z-ordering - regions Spatial joins: find (quickly) all counties intersecting Carnegie Mellon 15 -415 - C. Faloutsos lakes 79

z-ordering - regions Spatial joins: find (quickly) all counties intersecting lakes Naive algorithm: O(

z-ordering - regions Spatial joins: find (quickly) all counties intersecting lakes Naive algorithm: O( N * M) Something faster? Carnegie Mellon 15 -415 - C. Faloutsos 80

z-ordering - regions Spatial joins: find (quickly) all counties intersecting Carnegie Mellon 15 -415

z-ordering - regions Spatial joins: find (quickly) all counties intersecting Carnegie Mellon 15 -415 - C. Faloutsos lakes 81

z-ordering - regions Spatial joins: find (quickly) all counties intersecting lakes Solution: merge the

z-ordering - regions Spatial joins: find (quickly) all counties intersecting lakes Solution: merge the lists of (sorted) z-values, looking for the prefix property footnote#1: ‘*’ needs careful treatment footnote#2: need dup. elimination Carnegie Mellon 15 -415 - C. Faloutsos 82

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea - 3 methods use w/ B-trees; algorithms (range, knn queries. . . ) non-point (eg. , region) data analysis; variations – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 83

z-ordering - variations Q: is z-ordering the best we can do? Carnegie Mellon 15

z-ordering - variations Q: is z-ordering the best we can do? Carnegie Mellon 15 -415 - C. Faloutsos 84

z-ordering - variations Q: is z-ordering the best we can do? A: probably not

z-ordering - variations Q: is z-ordering the best we can do? A: probably not - occasional long ‘jumps’ Q: then? Carnegie Mellon 15 -415 - C. Faloutsos 85

z-ordering - variations Q: is z-ordering the best we can do? A: probably not

z-ordering - variations Q: is z-ordering the best we can do? A: probably not - occasional long ‘jumps’ Q: then? A 1: Gray codes Carnegie Mellon 15 -415 - C. Faloutsos 86

z-ordering - variations A 2: Hilbert curve! (a. k. a. Hilbert-Peano curve) Carnegie Mellon

z-ordering - variations A 2: Hilbert curve! (a. k. a. Hilbert-Peano curve) Carnegie Mellon 15 -415 - C. Faloutsos 87

z-ordering - variations ‘Looks’ better (never long jumps). How to derive it? Carnegie Mellon

z-ordering - variations ‘Looks’ better (never long jumps). How to derive it? Carnegie Mellon 15 -415 - C. Faloutsos 88

z-ordering - variations ‘Looks’ better (never long jumps). How to derive it? order-1 Carnegie

z-ordering - variations ‘Looks’ better (never long jumps). How to derive it? order-1 Carnegie Mellon order-2 15 -415 - C. Faloutsos . . . order (n+1) 89

z-ordering - variations Q: function for the Hilbert curve ( h = f(x, y)

z-ordering - variations Q: function for the Hilbert curve ( h = f(x, y) )? A: bit-shuffling, followed by post-processing, to account for rotations. Linear on # bits. See textbook, for pointers to code/algorithms (eg. , [Jagadish, 90]) Carnegie Mellon 15 -415 - C. Faloutsos 90

z-ordering - variations Q: how about Hilbert curve in 3 -d? n-d? A: Exists

z-ordering - variations Q: how about Hilbert curve in 3 -d? n-d? A: Exists (and is not unique!). Eg. , 3 -d, order 1 Hilbert curves (Hamiltonian paths on cube) #1 Carnegie Mellon 15 -415 - C. Faloutsos #2 91

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea

z-ordering - Detailed outline • spatial access methods – z-ordering • • main idea - 3 methods use w/ B-trees; algorithms (range, knn queries. . . ) non-point (eg. , region) data analysis; variations – R-trees –. . . Carnegie Mellon 15 -415 - C. Faloutsos 92

z-ordering - analysis Q: How many pieces (‘quad-tree blocks’) per region? A: proportional to

z-ordering - analysis Q: How many pieces (‘quad-tree blocks’) per region? A: proportional to perimeter (surface etc) Carnegie Mellon 15 -415 - C. Faloutsos 93

z-ordering - analysis (How long is the coastline, say, of England? Paradox: The answer

z-ordering - analysis (How long is the coastline, say, of England? Paradox: The answer changes with the yardstick -> fractals. . . ) Carnegie Mellon 15 -415 - C. Faloutsos 94

z-ordering - analysis Q: Should we decompose a region to full detail (and store

z-ordering - analysis Q: Should we decompose a region to full detail (and store in B-tree)? Carnegie Mellon 15 -415 - C. Faloutsos 95

z-ordering - analysis Q: Should we decompose a region to full detail (and store

z-ordering - analysis Q: Should we decompose a region to full detail (and store in B-tree)? A: NO! approximation with 1 -3 pieces/zvalues is best [Orenstein 90] Carnegie Mellon 15 -415 - C. Faloutsos 96

z-ordering - analysis Q: how to measure the ‘goodness’ of a curve? Carnegie Mellon

z-ordering - analysis Q: how to measure the ‘goodness’ of a curve? Carnegie Mellon 15 -415 - C. Faloutsos 97

z-ordering - analysis Q: how to measure the ‘goodness’ of a curve? A: e.

z-ordering - analysis Q: how to measure the ‘goodness’ of a curve? A: e. g. , avg. # of runs, for range queries 4 runs 3 runs (#runs ~ #disk accesses on B-tree) Carnegie Mellon 15 -415 - C. Faloutsos 98

z-ordering - analysis Q: So, is Hilbert really better? A: 27% fewer runs, for

z-ordering - analysis Q: So, is Hilbert really better? A: 27% fewer runs, for 2 -d (similar for 3 -d) Q: are there formulas for #runs, #of quadtree blocks etc? A: Yes ([Jagadish; Moon+ etc] see textbook) Carnegie Mellon 15 -415 - C. Faloutsos 99

z-ordering - fun observations Hilbert and z-ordering curves: “space filling curves”: eventually, they visit

z-ordering - fun observations Hilbert and z-ordering curves: “space filling curves”: eventually, they visit every point in n-d space - therefore: order-1 Carnegie Mellon order-2 . . . order (n+1) 15 -415 - C. Faloutsos 100

z-ordering - fun observations. . . they show that the plane has as many

z-ordering - fun observations. . . they show that the plane has as many points as a line (-> headaches for 1900’s mathematics/topology). (fractals, again!) order-1 Carnegie Mellon order-2 . . . order (n+1) 15 -415 - C. Faloutsos 101

z-ordering - fun observations Observation #2: Hilbert (like) curve for video encoding [Y. Matias+,

z-ordering - fun observations Observation #2: Hilbert (like) curve for video encoding [Y. Matias+, CRYPTO ‘ 87]: Given a frame, visit its pixels in randomized hilbert order; compress; and transmit Carnegie Mellon 15 -415 - C. Faloutsos 102

z-ordering - fun observations In general, Hilbert curve is great for preserving distances, clustering,

z-ordering - fun observations In general, Hilbert curve is great for preserving distances, clustering, vector quantization etc Carnegie Mellon 15 -415 - C. Faloutsos 103

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering –

SAMs - Detailed outline • spatial access methods – problem dfn – z-ordering – R-trees Carnegie Mellon 15 -415 - C. Faloutsos 104

Conclusions • z-ordering is a great idea (n-d points -> 1 -d points; feed

Conclusions • z-ordering is a great idea (n-d points -> 1 -d points; feed to B-trees) • used by TIGER system and (most probably) by other GIS products • works great with low-dim points Carnegie Mellon 15 -415 - C. Faloutsos 105