Database System Architecture and Implementation Execution Costs Slides

  • Slides: 20
Download presentation
Database System Architecture and Implementation Execution Costs Slides Credit: Michael Grossniklaus – Uni-Konstanz 1

Database System Architecture and Implementation Execution Costs Slides Credit: Michael Grossniklaus – Uni-Konstanz 1

Orientation Web Forms Applications SQL Interface SQL Commands DBMS Executor Parser Operator Evaluator Optimizer

Orientation Web Forms Applications SQL Interface SQL Commands DBMS Executor Parser Operator Evaluator Optimizer Transaction ! ere h Manager e r a We Lock Manager Files and Index Structures Buffer Manager Recovery Manager Disk Space Manager Index and Data Files Catalog Database 2 Figure Credit: Raghu Ramakrishnan and Johannes Gehrke: “Database Management Systems”, Mc. Graw-Hill, 2003.

Recall Heap Files • Heap files provide just enough structure to maintain a collection

Recall Heap Files • Heap files provide just enough structure to maintain a collection of records (of a table) • The heap file supports sequential (open. Scan(∙)) over the collection SQL query leading to a sequential scan SELECT A, B FROM R • No other operations get specific support from heap files Slides Credit: Michael Grossniklaus – Uni-Konstanz 3

Systematic File Organization SQL queries calling for systematic file organization SELECT A, B FROM

Systematic File Organization SQL queries calling for systematic file organization SELECT A, B FROM R WHERE C > 45 SELECT A, B FROM R ORDER BY C ASC • For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R ! Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Slides Credit: Michael Grossniklaus – Uni-Konstanz 4

Systematic File Organization SQL queries calling for systematic file organization SELECT A, B FROM

Systematic File Organization SQL queries calling for systematic file organization SELECT A, B FROM R WHERE C > 45 SELECT A, B FROM R ORDER BY C ASC • For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R ! Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? • Allocate records of table R in ascending order of attribute C values • Place records in neighboring pages • (Only include columns A, B, and C in the records) Slides Credit: Michael Grossniklaus – Uni-Konstanz 5

Module Overview • Three different file organizations 1. files containing randomly ordered records (heap

Module Overview • Three different file organizations 1. files containing randomly ordered records (heap files) 2. files sorted on one or more record fields 3. files hashed on one or more record fields • Comparison of file organizations – simple cost model – application of cost model to file operations • Introduction to index concept – clustered vs. unclustered indexes – dense vs. sparse indexes Slides Credit: Michael Grossniklaus – Uni-Konstanz 6

Comparison of File Organizations • Competition of three file organizations in five disciplines 1.

Comparison of File Organizations • Competition of three file organizations in five disciplines 1. scan: read all records in a give file 2. search with equality test 3. search with range selection (upper or lower bound may be unspecified) 4. insert a given record in the file, respecting the file’s organization 5. delete a record (identified by its rid), maintain the file’s organization SQL queries calling for equality test and range selection support SELECT * FROM R WHERE C = 45 SELECT * FROM R WHERE A > 0 AND A < 100 Slides Credit: Michael Grossniklaus – Uni-Konstanz 7

Simple Cost Model • A cost model is used to analyze the execution time

Simple Cost Model • A cost model is used to analyze the execution time of a given database operations – block I/O operations are typically a major cost factor – CPU time to account for searching inside a page, comparing a record field to selection constant, etc. • To estimate the execution time of the five database operation, we introduce a coarse cost model – – omits cost of network access does not consider cache effects neglects burst I/O … • Cost models play an important role in query optimization Slides Credit: Michael Grossniklaus – Uni-Konstanz 8

Simple Cost Model Simple cost model parameters Parameter b r D C H Description

Simple Cost Model Simple cost model parameters Parameter b r D C H Description number of pages in the file number of records on a page time to read/write a disk page CPU time needed to process a record (e. g. , compare a field value) CPU time take to apply a function to a record (e. g. , a comparison or hash function) • Some typical values – D ≈ 15 ms – C ≈ H ≈ 0. 1 μs Slides Credit: Michael Grossniklaus – Uni-Konstanz 9

Back to the Future A simple hash function A hashed file uses a hash

Back to the Future A simple hash function A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER) of the record to compute the corresponding page number. h(� 42, true, ‘dog’�) → 2 (42 = 1010102) h(� 14, true, ‘cat’�) → 6 (14 = 11102) h(� 26, false, ‘mouse’�) → 2 (26 = 110102) • The hash function determines the page number only, record placement inside a page is not prescribed • If a page p is filled to capacity, a chain of overflow pages is maintained to store additional records with h(� …� ) = p • To avoid immediate overflowing when a new record is inserted, pages are typically filled to 80% only when a heap file is initially (re)organized into a hash file Slides Credit: Michael Grossniklaus – Uni-Konstanz 10

Cost of Scan • Slides Credit: Michael Grossniklaus – Uni-Konstanz 11

Cost of Scan • Slides Credit: Michael Grossniklaus – Uni-Konstanz 11

Hashed File ! Scanning a hashed file In which order does a scan of

Hashed File ! Scanning a hashed file In which order does a scan of a hashed file retrieve its records? Slides Credit: Michael Grossniklaus – Uni-Konstanz 12

Cost of Search with Equality Test • ! Nevertheless, no DBMS will implement binary

Cost of Search with Equality Test • ! Nevertheless, no DBMS will implement binary search for value lookup Why? Slides Credit: Michael Grossniklaus – Uni-Konstanz 13

Cost of Search with Equality Test • Slides Credit: Michael Grossniklaus – Uni-Konstanz 14

Cost of Search with Equality Test • Slides Credit: Michael Grossniklaus – Uni-Konstanz 14

Cost of Search with Range Selection • Slides Credit: Michael Grossniklaus – Uni-Konstanz 15

Cost of Search with Range Selection • Slides Credit: Michael Grossniklaus – Uni-Konstanz 15

Cost of Insert • Slides Credit: Michael Grossniklaus – Uni-Konstanz 16

Cost of Insert • Slides Credit: Michael Grossniklaus – Uni-Konstanz 16

Cost of Delete • Slides Credit: Michael Grossniklaus – Uni-Konstanz 17

Cost of Delete • Slides Credit: Michael Grossniklaus – Uni-Konstanz 17

Performance Comparison • Performance of range selections for files of increasing size (D =

Performance Comparison • Performance of range selections for files of increasing size (D = 15 ms, C = 0. 1 μs, r = 100, n = 10) Performance graph 18 Figure Credit: Marc H. Scholl, University of Konstanz, Germany

Performance Comparison • Performance of deletions for files of increasing size (D = 15

Performance Comparison • Performance of deletions for files of increasing size (D = 15 ms, C = 0. 1 μs, r = 100, n = 1) Performance graph 19 Figure Credit: Marc H. Scholl, University of Konstanz, Germany

And the Winner Is… • There is no single file organization that responds equally

And the Winner Is… • There is no single file organization that responds equally fast to all five operations • This is a dilemma because more advanced file organizations can make a real difference in speed (see previous slides) • There exist index structures which offer all advantages of a sorted file and support insertions/deletions efficiently (at the cost of a modest space overhead): B+ trees • Before discussing B+ trees in detail, the following introduces the index concept in general Slides Credit: Michael Grossniklaus – Uni-Konstanz 20