Clustered vs Unclustered Index CLUSTERED Index entries direct

  • Slides: 9
Download presentation
Clustered vs. Unclustered Index CLUSTERED Index entries direct search for data entries Data entries

Clustered vs. Unclustered Index CLUSTERED Index entries direct search for data entries Data entries UNCLUSTERED Data entries (Index File) (Data file) Data Records 1

B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) v Index leaf

B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) v Index leaf pages contain data entries, and are chained (prev & next) v Index non-leaf pages have index entries; only used to direct searches: index entry P 0 K 1 P 1 K 2 P 2 K m Pm 2

Example B+ Tree Note how data entries in leaf level are sorted Root 17

Example B+ Tree Note how data entries in leaf level are sorted Root 17 Entries < 17 5 2* 3* v v Entries >= 17 27 13 5* 7* 8* 14* 16* 22* 24* 30 27* 29* 33* 34* 38* 39* Find: 29*? 28*? All > 15* and < 30* Insert/delete: Find data entry in leaf, then change it. 3

Cost Model for Our Analysis v Notes: § § We ignore CPU costs, for

Cost Model for Our Analysis v Notes: § § We ignore CPU costs, for simplicity. Measuring number of page I/Os ignores gains of prefetching a sequence of pages Thus even I/O cost is only approximated. Average-case analysis; based on simplistic assumptions. * Good enough to show overall trends! 4

Cost Model for Our Analysis Variables : § § B: The number of data

Cost Model for Our Analysis Variables : § § B: The number of data pages R: Number of records per page 5

Comparing File Organizations v Heap files (random order; insert at eof) v Sorted files,

Comparing File Organizations v Heap files (random order; insert at eof) v Sorted files, sorted on <age, sal> v Clustered B+ tree file, search key <age, sal> v Heap file with unclustered B + tree index on search key <age, sal> v Heap file with unclustered hash index on search key <age, sal> 6

Operations to Compare v v v Scan: Fetch all records from disk Equality search

Operations to Compare v v v Scan: Fetch all records from disk Equality search Range selection Insert a record Delete a record 7

Assumptions in Our Analysis v Heap Files: § v Sorted Files: § v Equality

Assumptions in Our Analysis v Heap Files: § v Sorted Files: § v Equality selection on key; exactly one match. Files compacted after deletions. Indexes: § data entry size/pointers = 10% size of data record § Hash: No overflow buckets. • § Tree: 67% occupancy (this is typical). • v 80% page occupancy => “File size = 1. 25 data size” “Implies file size = 1. 5 data size” Scans: § Leaf levels of a tree-index are chained. § Index data-entries plus actual file scanned for unclustered indexes. v Range searches: § We use tree indexes to restrict set of data records fetched, but ignore hash indexes. 8

Cost of Operations * Several assumptions underlie these (rough) estimates! 9

Cost of Operations * Several assumptions underlie these (rough) estimates! 9