Indexing Techniques Advanced Databases Indexing Techniques The Problem
Indexing Techniques Advanced Databases Indexing Techniques
The Problem • What can we introduce to make search more efficient? – Indices! • What is an index? … Advanced Databases … Indexing Techniques 2
Definitions • Index: an auxiliary data structure to speed up record retrieval • Search key: the field/s of a table which is/are indexed • Storage: index files that contain index records – Each entry storing • Actual data record • or, search key value k and record ID <k, rid> • or, search key value k and list of records IDs <k, rid list> • Types: ordered and unordered (hash) indices Anna Paul Tim Advanced Databases Indexing Techniques Page i+1 3
Types of Ordered Indices (1/3) • Assuming ordered data files • Depending on which field is indexed – Primary index: search key is ordering key field • Pointer for each page – Secondary index: search key is non ordering field 00112233 00112235 00112236 00112238 primary Advanced Databases secondary 00112233 Paul 00112234 Anna 00112235 Matt 00112236 Tim 00112237 Carol Indexing Techniques. Rob 00112238 Anna Carol Paul Tim 4
Types of Ordered Indices (2/3) • Depending on the density of index records – Dense index: an index record for each distinct search key value, ie every record – Sparse index: index records for only some search key values • search key value for first record in page 00112233 • pointer to page 00112234 dense 00112235 00112233 00112236 00112235 00112237 00112236 00112233 Paul 00112238 00112234 Anna sparse Advanced Databases 00112235 Matt 00112236 00112237 Tim Carol 00112238 Rob Indexing Techniques 5
Types of Ordered Indices (3/3) • Ordering field is nonkey (may have duplicates) – Clustered index – Unclustered index 00112233 00112234 00112235 00112236 00112237 00112238 01112233 01112236 02112236 unclustered Advanced Databases clustered 00112234 00112237 Anna Carol 00112235 Matt 00112233 01112233 00112238 Paul Rob 00112236 01112236 02112236 Tim Tim Indexing Techniques Anna Carol Matt Paul Rob Tim 6
Indices Exercise • • 215 records 128 bytes/record 210 bytes/page ordered file equality search on ordering field, unspanned organization – without an index – with a primary index • on field of size 12 bytes • assume pointer 4 bytes long Advanced Databases Indexing Techniques 7
Multi-level Indices (1/2) • If access using first-level index is still expensive • Build a sparse index on the first-level index – Multi-level Index • Fan-out: index blocking factor 00112233 first-level index 00112234 00112235 00112233 00112234 Paul Anna 00112235 Matt 00112236 00112237 Tim Carol Advanced Databases Rob 00112238 00112236 00112237 00112238 Indexing Techniques 00112233 00112235 00112236 second-level index 8
Multi-level Indices (2/2) • 26 index records/page (fan-out) • 215 index records • 1 st-level – 29 pages • 2 nd-level – 29 index records – 23 pages • 3 rd-level – 23 index records – 1 page • 1 <= 215 / (26)t • t = ceil(log 26 215 ) = 3 • t = ceil(logfo#index-records) Advanced Databases Indexing Techniques 9
Dynamic multi-level indices • So far assumed indices are physically ordered files – expensive insertions and deletions • Dynamic multi-level indices – B trees – B+ trees Advanced Databases Indexing Techniques 10
Tree-structured Indices • For each node: K 1 < K 2 < … Kq-1 • For each value X in subtree pointed to by Pi – Ki-1< X < Ki, 1<i<q – X < Ki, i=1 – Ki-1< X, i=q P 1 K 1 X Advanced Databases … Ki-1 Pi Ki X Indexing Techniques … Kq-1 Pq X 11
B tree • Problems: empty nodes, unbalanced trees – solution: B trees … … … … Advanced Databases Indexing Techniques … 12
B tree: Definition • • Each node: <P 1, <K 1, Pr 1>, P 2, …, <Kq-1, Prq-1>, Pq> Pi tree pointer, Ki search value, Pri data pointer For each node: K 1 < K 2 < … Kq-1 For each value X in subtree pointed to by Pi – Ki-1< X < Ki, 1<i<q – X < Ki, i=1 – Ki-1< X, i=q • Each node at most q pointers – B tree is order q • Each node at least ceil(q/2) tree pointers – except from root • Internal node with p pointers has p-1 values • All leaves at the same level – balanced tree Advanced Databases Indexing Techniques 13
B tree: Example ø 1 ø 3 ø 5 8 ø 6 ø 7 ø ø 9 ø 12 tree pointer data pointer ø null pointer Advanced Databases Indexing Techniques 14 ø
B+ tree • Most implementations of B tree are B+ tree • Data pointers only in leaves – – more entries in internal nodes than regular B trees less internal nodes less levels faster access Advanced Databases Indexing Techniques 15
B+ tree: Definition • • Internal nodes: <P 1, K 1, P 2, …, Pq-1, Kq-1, Pq> Leaf nodes: <<K 1, Pr 1>, <K 2, Pr 2>, …, <Kp-1, Prp-1>, Pnext> Pri points a data records or block of pointers of such records leaf order 120 101 110 150 180 120 130 156 179 Advanced Databases Indexing Techniques 180 200 16
B+ tree: Search • At each level, find smallest Ki larger than search key • Follow associated pointer Pi 100 30 3 5 11 120 30 180 35 100 101 110 Advanced Databases 150 120 130 Indexing Techniques 150 156 179 180 200 17
B+ tree: Insert • Nodes may overflow or underflow • Ignoring overflow or underflow • Inserting data record with search key value k – find leaf node – if k found • add record to file, create indirect block if there isn’t one • add record pointer to indirect block – if k not found • add data record to file • insert record pointer in leaf node (all search keys in order) Advanced Databases Indexing Techniques 18
B+ tree: Delete • • Ignoring overflow or underflow Find leaf node with search key value k Find data record pointer, delete record delete index record – and indirect block, if any, if empty Advanced Databases Indexing Techniques 19
B+ tree: Simple Insert • Insert 42 100 k < 100 30 3 5 11 120 30 35 180 42 100 101 110 Advanced Databases 150 120 130 Indexing Techniques 150 156 179 180 20
B+ tree: Leaf Overflow (1/2) • Insert 9 100 k < 100 30 3 5 11 120 30 35 180 42 100 101 110 Advanced Databases 150 120 130 Indexing Techniques 150 156 179 180 200 21
B+ tree: Leaf Overflow (2/2) • first ceil(n/2) in existing node, rest in new leaf node • n=3+1=4 100 k < 100 9 3 5 30 9 120 11 30 101 110 Advanced Databases 35 150 180 42 120 130 Indexing Techniques 150 156 179 180 200 22
B+ tree: Internal Node Overflow (1/3) • Insert 210, insert 205 100 k < 100 9 3 5 30 9 120 11 30 101 110 Advanced Databases 35 150 180 42 120 130 Indexing Techniques 150 156 179 180 200 210 23
B+ tree: Internal Node Overflow (2/3) • Leaf Split 100 k < 100 9 3 5 30 9 120 11 30 101 110 35 150 180 42 120 130 156 179 180 205 210 Advanced Databases Indexing Techniques 24
B+ tree: Internal Node Overflow (3/3) 100 k < 100 9 150 30 120 180 3 5 9 11 30 101 110 35 205 42 120 130 156 179 180 205 210 Advanced Databases Indexing Techniques 25
B+ tree: New Root (1/2) • Insert 210, insert 205 120 101 110 120 130 150 180 156 179 180 205 210 Advanced Databases Indexing Techniques 26
B+ tree: New Root (2/2) 150 120 101 110 120 130 180 156 179 205 180 205 210 Advanced Databases Indexing Techniques 27
Index Insert Exercise • Insert 8, 7, 41 9 3 Advanced Databases 5 30 9 11 Indexing Techniques 30 35 42 28
B+ tree: Delete • Simple delete case • Underflow case: – redistribute records – coalesce with siblings – update parents Advanced Databases Indexing Techniques 29
B+ tree: Simple Delete (1/2) • Delete 110 150 120 101 110 120 130 180 156 179 205 180 205 210 215 Advanced Databases Indexing Techniques 30
B+ tree: Simple Delete (2/2) • Leaf Updated 150 120 101 120 130 180 156 179 205 180 205 210 215 Advanced Databases Indexing Techniques 31
B+ tree: Delete Redistribution (1/2) • Delete 180 150 120 101 120 130 180 156 179 205 180 205 210 215 Advanced Databases Indexing Techniques 32
B+ tree: Delete Redistribution (2/2) • Redistribute entries – left or right sibling 150 120 101 120 130 179 150 156 205 179 200 205 210 Advanced Databases Indexing Techniques 33
B+ tree: Delete Coalesce (1/4) • Delete 101 150 120 101 120 130 179 150 156 205 179 200 205 210 215 Advanced Databases Indexing Techniques 34
B+ tree: Delete Coalesce (2/4) • Leaf updated • No redistribution 150 – sibling coalesce 120 100 120 130 179 150 156 205 179 200 205 210 215 Advanced Databases Indexing Techniques 35
B+ tree: Delete Coalesce (3/4) • Leaf updated • No redistribution 150 – sibling coalesce 179 100 120 130 156 205 179 200 205 210 215 Advanced Databases Indexing Techniques 36
B+ tree: Delete Coalesce (4/4) • Redistribution 179 150 100 120 130 205 150 156 179 200 205 210 215 Advanced Databases Indexing Techniques 37
Hashing Techniques Advanced Databases Indexing Techniques
Static Hashing (1/2) • Store records in buckets with overflow chains • Allocate a fixed number of buckets M • Problems: – small M • long overflow chains, slow search-delete-insert null h Advanced Databases Indexing Techniques 39
Static Hashing (2/2) • Problems: – large M • wasted space, slow scan null h null Advanced Databases Indexing Techniques 40
Dynamic Hashing • Splitting and coalescing buckets as the database grows-shrinks • One scheme: Extendible Hashing • Hash function generates large values, eg 32 bits – use i bits, change i as database size changes • If overflow, double the number of buckets – use i+1 bits of the hash function – but, expensive: read all pages M and distribute records in 2*M pages • solution: use a directory and double the size of the directory – only split bucket that overflowed Advanced Databases Indexing Techniques 41
Extendible Hashing (1/4) 2 2 00 01 h(18) = 10010 10 11 16 20 A 2 1 B 2 18 C 2 Directory 3 7 D Buckets Advanced Databases Indexing Techniques 42
Extendible Hashing (2/4) 2 2 00 01 h(4) = 00100 10 11 16 20 A 2 1 B 2 18 2 3 Advanced Databases Indexing Techniques C 7 D 43
Extendible Hashing (3/4) 3 2 00 01 10 16 A 2 1 B 2 11 18 Advanced Databases Indexing Techniques 3 20 2 3 4 C 7 A 1 D 44
Extendible Hashing (4/4) • Global Depth • Local Depth • If bucket full: – split bucket – increment LD • If GD=LD – increment GD – double directory 3 3 000 001 A 2 1 B 010 011 2 100 18 101 2 3 110 111 Advanced Databases 16 Indexing Techniques 3 20 4 C 7 A 1 D 45
Extendible Hashing: Delete • If deletion make bucket empty – merge with split image • If directory pointers point to same bucket as split image – directory halved Advanced Databases Indexing Techniques 46
Extendible Hashing: Summary • • Avoids overflow pages Directory can get large Key search requires just 2 page reads Space utilization fluctuates – 59 -90% for uniformly distributed records Advanced Databases Indexing Techniques 47
Extendible Hashing: Exercise • • • Initially GD = LD = 1 M = 2 buckets Hash function: h(k) = k mod 2 i inserts: 14, 18, 22, 3, 9 deletes 9, 22, 3 1 00 01 Advanced Databases Indexing Techniques 1 12 8 1 5 48
- Slides: 48