Query Processing Part 2 Dense and Sparse Indexes
- Slides: 55
Query Processing Part 2: Dense and Sparse Indexes 1
“Field” means “attribute” Terminology • A sorted (or sequential ) file is stored (on a disk) sequentially in sorted order – We sort a file on a sequence (or sort ) field – Can also sort lexicographically on several fields • A heap is a file that is stored on a disk in no particular order • A search key is a field(s) on which we can search efficiently for records with a given key value – Search keys are implemented by dense & sparse indexes, B+trees, hash tables 2
Two Types of Search Keys • A file is organized according to a primary search key that – Determines the location of a record in the file – Is used for insertions, deletions and updates – Is usually called primary key, although it does not necessarily define a record uniquely • A secondary search key is used only for searching (usually called secondary key) In the context of query processing, “key” usually means “search key, ” and it is not a key in the 3 sense of FDs, namely, duplicate values are possible
Dense and Sparse Indexes for Primary Keys 4
Sequential File 10 20 30 40 50 60 70 80 90 100 5
Dense Index An index entry for each record of the file 10 20 30 40 50 60 70 80 90 100 110 120 Sequential File 10 20 30 40 50 60 70 80 90 100 6
Sparse Index Only one index entry for each block (for the block’s first value) Given V, follow the pointer for the largest K, s. t. K V 10 30 50 70 90 110 130 150 170 190 210 230 Sequential File 10 20 30 40 50 60 70 80 90 100 7
Sequential File Sparse 2 nd level 10 90 170 250 330 410 490 570 10 30 50 70 90 110 130 150 170 190 210 230 10 20 30 40 50 60 70 80 90 100 8
Comments & Questions • The index blocks are not necessarily contiguous, but they are chained in both directions • Same for the blocks of a file • Can a heap have a sparse index? • Should we sometimes use a dense index in the second (or higher) level? • If the file is contiguous, we can compute the pointers, instead of storing them in the index 9
R 1 K 1 R 2 K 3 R 3 K 4 A block has 1024 Bytes R 4 Works for Dense? Sparse? Find the K 3 block by computing its offset: 10 (3 -1)1024 = 2048 bytes
Duplicate Keys in the File 10 10 10 20 20 30 30 30 40 45 11
Dense Index for the File 10 10 10 20 20 30 30 30 Do we need repeated keys in the index? 10 10 10 20 20 30 30 30 40 45 12
Keys Repeated in File But not in Dense Index Now it is more like a sparse index 10 20 30 40 10 10 10 20 20 30 Recall that the file’s blocks are chained (same for the index) 30 30 40 45 13
Sparse Index for a File with Duplicate Keys 10 10 20 30 10 10 10 20 20 30 30 30 40 45 14
How to Search? Still Valid? 10 10 20 30 Given V, follow the pointer for the largest K, s. t. K V 10 10 10 20 20 30 30 30 40 45 15
Fixing the Problem place first new key from block 10 20 30 30 10 10 10 20 20 30 30 30 40 45 16
Need Repeated Keys in Index? place first new key from block should this be 40? 10 20 30 30 10 10 10 20 20 30 Recall that the file’s blocks are chained (same for the index) 30 30 40 45 17
To Sum Up • Sparse index points to a block only if it has a new value not seen before • The smallest such value is associated with the pointer to the block During search, File when do we a have to Index continue to the next block of the file? a ü Given V, follow the pointer for the largest K, s. t. K V a. . b 18
Deletion from Sparse Index delete record 40 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 19
Deletion from Sparse Index delete record 40 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 20
Deletion from Sparse Index delete record 30 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 21
Deletion from Sparse Index delete record 30 40 10 30 50 70 90 110 130 150 10 20 30 40 40 50 60 70 80 22
Deletion from Sparse Index delete record 30 & 40 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 23
Deletion from Sparse Index delete record 30 & 40 10 30 50 70 90 110 130 150 10 20 30 40 50 60 70 80 24
Deletion from Sparse Index delete record 30 & 40 10 50 30 70 50 70 90 110 130 150 10 20 30 40 50 60 70 80 25
Deletion from Dense Index delete record 30 10 20 30 40 50 60 70 80 26
Deletion from Dense Index delete record 30 10 20 30 40 50 60 70 80 10 20 30 40 40 50 60 70 80 27
Deletion from Dense Index delete record 30 40 10 20 30 40 50 60 70 80 10 20 30 40 40 50 60 70 80 Unlike sparse index, always have to update the index 28
Insertion into Sparse Index insert record 34 10 30 40 60 10 20 30 40 50 60 29
Insertion into Sparse Index insert record 34 10 30 40 60 Our lucky day! We have free space where we need it! 10 20 30 34 40 50 60 30
Insertion into Sparse Index insert record 15 10 30 40 60 10 20 30 40 50 60 31
Insertion into Sparse Index insert record 15 20 10 30 40 60 10 20 15 30 20 30 40 50 60 • Immediate reorganization of both the file and the index 32
What if we now have to insert 13? 20 10 30 40 60 10 20 15 30 20 30 40 50 60 • Can add a new block to the file’s chain between the first and second blocks ØAlso need to add new entry to the index • But blocks will no longer be contiguous 33
Alternative: Use Overflow Blocks insert record 25 10 30 40 60 10 20 30 40 50 60 34
Alternative: Use Overflow Blocks insert record 25 10 30 40 60 10 20 30 25 overflow blocks (reorganize later. . . ) 40 50 60 No need to update the index 35
Insertion into Dense Index • Similar but often more expensive, because we have to update the index after every insertion 36
Dense Indexes for Secondary Keys An index for a secondary key is sometimes called a secondary index 37
Secondary Indexes • Only the primary (i. e. , organizing) index can determine the physical order of the records on the disk • Secondary index is on an unsorted field Sequence field 30 50 20 70 80 40 10 90 60 38
Sparse Secondary Indexes Sequence field 30 20 80 100 90. . . 30 50 20 70 80 40 10 90 60 39
Sparse Secondary Indexes Sequence field 30 20 80 100 30 50 20 70 90. . . 80 40 does not make sense! 100 10 90 60 40
Secondary Index Must be Dense Sequence field 30 50 20 70 80 40 10 90 60 41
Dense Secondary Index Sequence field 10 20 30 40 30 50 50 60 70. . . 80 40 20 70 10 90 60 42
But Higher Levels are Sparse Sequence field 10 50 90. . . sparse high level 10 20 30 40 30 50 50 60 70. . . 80 40 20 70 10 90 60 43
Duplicate Values & Secondary Indexes 20 10 20 40 10 40 30 40 44
Duplicate Values & Secondary Indexes one option. . . 10 10 10 20 20 30 40 40. . . 20 10 20 40 10 40 30 40 45
Duplicate Values & Secondary Indexes one option. . . Problem: excess overhead! • disk space • search time 10 10 10 20 20 30 40 40. . . 20 10 20 40 10 40 30 40 46
Duplicate Values & Secondary Indexes another option. . . 10 20 20 40 30 40 10 40 30 40 47
Duplicate Values & Secondary Indexes another option. . . Problem: Variable-size records in index! 10 20 20 40 30 40 10 40 30 40 48
An Intermediate Level of Buckets 20 10 10 20 30 40 20 40 10 40 50 60. . . 10 40 30 40 buckets 49
Why “Bucket” Idea is Useful Indexes Name: primary Dept: secondary Floor: secondary Records EMP (name, dept, floor, . . . ) 50
Query: Get employees in (Toy Dept) ^ (2 nd floor) Dept. index Toy EMP Floor index 2 nd 51
Query: Get employees in (Toy Dept) ^ (2 nd floor) Dept. index Toy EMP Floor index 2 nd Intersect toy bucket and 2 nd Floor bucket to get the set of matching EMP’s 52
Summary of Dense & Sparse Indexes • Both are simple (as long as there is just one level) • Sparse is more efficient, because the index is smaller and more of it can be kept in memory • Insertions are expensive when performed (if immediate reorganization is done), or over time (since performance deteriorates due to overflow) – More of a problem in a dense index, because every insertion also changes the index • Secondary indexes must be dense • Sometimes dense indexes improve efficiency by intersecting sets of pointers before accessing file 53
Something to Think About • To shorten an index, can we use pointers just to blocks (instead of to records)? In which cases? • In a relational system, can we organize a relation according to a field which is not a key in the FD sense? How? – If so, what is the advantage of doing that? – If so, how would we enforce the constraint that there should not be two records with the same key? 54
Note • If a file is stored in sorted order on some field, then that field must be the primary search key • If the file is stored as a heap (i. e. , not sorted on any field), then the index for the primary search key must be dense • If the primary search key is based on a lexicographic order of several fields and the file is sorted accordingly, then any prefix of those fields is a secondary search key 55
- Primary index is dense or sparse
- Define dense index
- Layers of earth from most dense to least dense
- Earth's layers most dense to least dense
- Earth's layers most dense to least dense
- Query tree and query graph
- Query tree and query graph
- Dns recursive iterative
- Typologies are typically nominal composite measures.
- Steps of query processing
- Algorithms for query processing and optimization
- The logical view of data is
- Interplanar distance formula
- Istoxx factor indexes futures
- What is security market index
- Clusttered
- Laspeyres price index
- Productivity index
- Irr
- Sys index
- Differentiating factors in constructing market indexes
- Steps in query processing
- Query optimization steps
- Steps of query processing
- Measures of query cost in dbms
- Objectives of query processing
- Steps in query processing
- Distributed query processing
- Distributed query processing
- Sketch techniques for approximate query processing
- What is the role of eddy in adaptive query processing
- Distributed query processing
- Distributed query processing
- Distributed query processing
- Basic steps in query processing
- Sql server intelligent query processing
- Neighborhood averaging in image processing
- Secondary processing of wheat
- What is interactive processing
- Very dense, warm and wet forests.
- Very dense warm and wet forest
- Connective tissue
- Adipose connective tissue function
- Sparse conditional constant propagation
- Suatu array dikatakan lower triangular jika
- Sparse matrix operator kernel emissions
- Sparse matrix matlab
- Example of sparse matrix
- Sparse conditional constant propagation
- On single image scale-up using sparse-representations
- Sparse matrix in data structure
- Sparse pca
- Incidence matrix
- Vanitas vanitatum petrarca
- Efficient sparse voxel octrees
- Arid region with sparse to almost noneexistent vegetation