Clustered vs Unclustered Index CLUSTERED Index entries direct









- Slides: 9

Clustered vs. Unclustered Index CLUSTERED Index entries direct search for data entries Data entries UNCLUSTERED Data entries (Index File) (Data file) Data Records 1

B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) v Index leaf pages contain data entries, and are chained (prev & next) v Index non-leaf pages have index entries; only used to direct searches: index entry P 0 K 1 P 1 K 2 P 2 K m Pm 2

Example B+ Tree Note how data entries in leaf level are sorted Root 17 Entries < 17 5 2* 3* v v Entries >= 17 27 13 5* 7* 8* 14* 16* 22* 24* 30 27* 29* 33* 34* 38* 39* Find: 29*? 28*? All > 15* and < 30* Insert/delete: Find data entry in leaf, then change it. 3

Cost Model for Our Analysis v Notes: § § We ignore CPU costs, for simplicity. Measuring number of page I/Os ignores gains of prefetching a sequence of pages Thus even I/O cost is only approximated. Average-case analysis; based on simplistic assumptions. * Good enough to show overall trends! 4

Cost Model for Our Analysis Variables : § § B: The number of data pages R: Number of records per page 5

Comparing File Organizations v Heap files (random order; insert at eof) v Sorted files, sorted on <age, sal> v Clustered B+ tree file, search key <age, sal> v Heap file with unclustered B + tree index on search key <age, sal> v Heap file with unclustered hash index on search key <age, sal> 6

Operations to Compare v v v Scan: Fetch all records from disk Equality search Range selection Insert a record Delete a record 7

Assumptions in Our Analysis v Heap Files: § v Sorted Files: § v Equality selection on key; exactly one match. Files compacted after deletions. Indexes: § data entry size/pointers = 10% size of data record § Hash: No overflow buckets. • § Tree: 67% occupancy (this is typical). • v 80% page occupancy => “File size = 1. 25 data size” “Implies file size = 1. 5 data size” Scans: § Leaf levels of a tree-index are chained. § Index data-entries plus actual file scanned for unclustered indexes. v Range searches: § We use tree indexes to restrict set of data records fetched, but ignore hash indexes. 8

Cost of Operations * Several assumptions underlie these (rough) estimates! 9