Database Management Systems Chapter 11 Physical Design Jerry
Database Management Systems Chapter 11 Physical Design Jerry Post Copyright © 2003 1
D A T A B A S E Performance Issues ² All access to the data is routed through the DBMS. ª Additional layer could slow retrieval and storage. ª Programs cannot directly control access to the data All Data ² Any access optimization must reside within the DBMS. ² Most common method: Index. DBMS Program 1 Queries Program 2 Reports 2
D A T A B A S E Physical Data Storage ² Some database systems let the designer choose how to store data. ª Rows for each table. ª Columns within a table. ª The choice influences performance and storage requirements. ª The choice depends on the characteristics of the data being stored. ² Index ª Most database systems use an index to improve performance. © Several methods can be used to store an index. ª An index can speed data retrieval. ª Maintaining many indexes on a table can significantly slow down data updates and additions. ª Choose indexes carefully to speed up certain large jobs. 3
D A T A B A S E Table Operations ² Retrieve data ª Read entire table. ª Read next row/sequential. ª Read arbitrary/random row. ² Store data ª Insert a row. ª Delete a row. ª Modify a row. ² Reorganize/pack database ª Remove deleted rows. ª Recover unused space. Last. Name Adams Adkins Allbright Anderson Baez Bailey Bell Carter Cartwright Carver Craig First. Name Kimberly Inga Searoba Charlotte Bessie Lou Ann Gayle Luther Phillip Glen Bernice Melinda Phone (406) 987 -9338 (706) 977 -4337 (619) 281 -2485 (701) 384 -5623 (606) 661 -2765 (502) 029 -3909 (360) 649 -9754 (717) 244 -3484 (219) 263 -2040 (502) 595 -1052 (804) 020 -5842 (502) 691 -7565 4
D A T A B A S E Deleting Data ² Deletes are flagged. ² Space is reused if possible when new row is added. ² If not exactly the same size, some blank holes develop. ² Packing removes all deleted data and removes blanks. Last. Name Adams Adkins Allbright Anderson Baez XBaez Bailey Bell Carter Cartwright Carver Craig First. Name Kimberly Inga Searoba Charlotte Bessie Lou Ann Gayle Luther Phillip Glen Bernice Melinda Phone (406) 987 -9338 (706) 977 -4337 (619) 281 -2485 (701) 384 -5623 (606) 661 -2765 (502) 029 -3909 (360) 649 -9754 (717) 244 -3484 (219) 263 -2040 (502) 595 -1052 (804) 020 -5842 (502) 691 -7565 5
D A T A B A S E Data Storage Methods ² Sequential ª Fast for reading entire table. ª Slow for random search. ² Indexed Sequential (ISAM) ª Better for searches. ª Slow to build indexes. ² B+-Tree ª Similar to ISAM. ª Efficient at building indexes. ² Direct / Hashed ª Extremely fast searches. ª Slow sequential lists. 6
D A T A B A S E Sequential Storage ² Common uses ª When large portions of the data are always used at one time. e. g. , 25% ª When table is huge and space is expensive. ª When transporting / converting data to a different system. 7
D A T A B A S E Operations on Sequential Tables ² Read entire table ª Easy and fast ² Sequential retrieval ª Easy and fast for one order. ² Random Read/Sequential ª ª Very weak Probability of any row = 1/N Sequential retrieval 1, 000 rows means 500, 000 retrievals per lookup! Row A B C D E … Prob. 1/N 1/N 1/N # Reads 1 2 3 4 5 i ² Delete ª Easy ² Insert/Modify ª Very weak 8
D A T A B A S E Insert into Sequential Table ² Insert Inez: ª ª Find insert location. Copy top to new file. At insert location, add row. Copy rest of file. 9
D A T A B A S E Pointers ² When data is stored on drive (or RAM). ª Operating System allocates space with a function call. ª Provides location/address. © Physical address © Virtual address (VSAM) Ÿ Imaginary drive values mapped to physical locations. © Relative address Ÿ Distance from start of file. Ÿ Other reference point. Volume Track Cylinder/Sector Byte Offset Drive Head Address Data Key value Address / pointer 10
D A T A B A S E Indexed Sequential Storage ² Common uses Address ª Large tables. ª Need many sequential lists. ª Some random search--with one or two key columns. ª Mostly replaced by B+-Tree. ID 1 2 3 4 5 6 7 8 9 10 Pointer A 11 A 22 A 32 A 47 A 58 A 63 A 67 A 78 A 83 Last. Name Pointer Carpenter A 67 Eaton A 58 Farris A 63 Gibson A 22 Hopkins A 42 James A 47 O'Connor A 78 Reasoner A 32 Reeves A 11 Shields A 83 A 11 A 22 A 32 A 47 A 58 A 63 A 67 A 78 A 83 ID 1 2 3 4 5 6 7 8 9 10 Last. Name Reeves Gibson Reasoner Hopkins James Eaton Farris Carpenter O'Connor Shields First. Name Keith Bill Katy Alan Leisha Anissa Dustin Carlos Jessica Howard Date. Hired 1/29/2001 3/31/2001 2/17/2001 2/8/ 2001 1/6/ 2001 8/23/ 2001 3/28/ 2001 12/29/ 2001 7/23/ 2001 7/13/ 2001 Indexed for ID and Last. Name 11
D A T A B A S E Binary Search ² Given a sorted list of names. ² How do you find Jones. ² Sequential search ª Jones = 10 lookups ª Average = 15/2 = 7. 5 lookups ª Min = 1, Max = 14 ² Binary search ª ª ª Find midpoint (14 / 2) = 7 Jones > Goetz Jones < Kalida Jones > Inez Jones = Jones (4 lookups) ² Max = log 2 (N) Adams Brown Cadiz Dorfmann Eaton Farris 1 Goetz Hanson 3 Inez 4 Jones 2 Kalida Lomax Miranda Norman 14 entries ª N = 1000 Max = 10 ª N = 1, 000 Max = 20 12
D A T A B A S E Linked List ² ² Separate each element/key. Pointers to next element. A 67 Pointers to data. Starting point. 8 Carpenter A 22 2 Gibson A 58 6 Eaton B 87 Eaton 3/31/2001 Dustin 3/28/2001 B 29 A 67 B 38 B 29 Bill Anissa 8/23/2001 A 63 7 Farris Carpenter Carlos 12/29/2001 Gibson 00 A 22 B 71 A 58 B 71 Farris B 38 A 63 13
D A T A B A S E Insert into a Linked List B 29 ² Get space/location with address. ª Data: Save row (A 97). ª Key: Save key and pointer to data (B 14). ² Find insert location. ª Eccles would be after Eaton and before Farris. ª From prior key (Eaton), put next address (B 71) into new key, next pointer. ª Put new address (B 14) in prior key, next pointer. Eaton B 14 B 71 Farris B 14 B 71 A 58 Eccles B 71 A 97 B 38 A 63 New. Data = new (. . . ) New. Key->Key = “Eccles” New. Key->Data = New. Data Find. Insert. Point(List, Prior. Key, New. Key) New. Key->Next = Prior. Key->Next = New. Key 14
D A T A B A S E B-Tree ² Store key values ² Utilize binary search (or better). ² Trees ª ª ª Nodes Root Leaf (node with no children) Levels / depth Degree (maximum number of children per node) < Key Brown A Cadiz B C >= Hanson Dorfmann Adams Data Kalida Farriis Inez Miranda Eaton Goetz Inez Jones Lomax D E F G H I J K L Norman M N 15
D A T A B A S E B+-Tree ² Special characteristics ª Set the degree (m) © m >= 3 © Usually an odd number. ² Example data ª 156, 231, 287, 315 ª 347, 458, 692, 792 ª Every node (except the root) must have between m/2 and m children. ª All leaves are at the same level/depth. ª All key values are displayed on the bottom leaves. ª A nonleaf node with n children will contain n-1 key values. ª Leaves are connected by pointers (sequential access). 16
D A T A B A S E B+-Tree Example ² Degree 3 ª At least m/2 = 1. 5 (=2) children. ª No more than 3 children. ² Search keys (e. g. , find 692) ª Less than ª Between ª Greater than ² Sequential links. < 315 <= < 231 <= < 287 <= < 156 <= < 231 <= < 287 <= < 458 <= < 792 <= < 315<= <347<= < 458<= <692 <= < 792 <= data 17
D A T A B A S E B+-Tree Insert ² Insert 257 ª Find location. ª Easy with extra space. ª Just add element. < 315 <= < 231 <= < 287 <= < 458 <= < 792 <= < 156 <= < 231<= <257<= < 287 <= < 315<= <347<= < 458<= <692 <= < 792 <= 18
D A T A B A S E ² Insert 532 ª ª ª B+-Tree Find location. Cell is full. Move up a level, cell is full. Move up to top and split. Eventually, add a level. Insert < 315 <= < 231 <= < 287 <= < 458 <= < 792 <= < 156 <= < 231<= <257<= < 287 <= < 315<= <347<= < 458<= <692 <= < 792 <= < 315 <= < 692 <= < 231 <= < 287 <= < 347 <= < 458 <= < 692 <= < 792 <= <156<= < 231 <= < 287 <= < 315 <= < 347 <= < 458 <= < 532 <= < 692 <= < 792 <= 19
D A T A B A S E B+-Tree Strengths ² Designed to give good performance for any type of data and usage. ª Lookup speed is based on degree/depth. Maximum is logm n. ª Sequential usage is fast. ª Insert, delete, modify are reasonable. © Many changes are easy. © Occasionally have to reorganize large sections. 20
D A T A B A S E Direct Access / Hashed ² Convert key value directly to location (relative or absolute). ª Use prime modulus © Choose prime number greater than expected database size (n). © Divide and use remainder. ² Example ª Prime = 101 ª Key = 528 ª Modulus = 23 ª Set aside spaces (fixedlength) to hold each row. ª Collision/overflow space for duplicates. ² Extremely fast retrieval. ² Very poor sequential access. ² Reorganize if out of space! Overflow/collisions 21
D A T A B A S E Comparison of Access Methods ² Choice depends on data usage. ª How often do data change? ª What percent of the data is used at one time? ª How big are the tables? ª How many joins are there? ª How many transactions are processed per second? ² Rules ª B+-Tree is best all-around. ª B+-Tree is better than ISAM ª Hashed is good for high-speed with random access. ª Sequential is good if often use entire table. 22
D A T A B A S E Storing Data Columns ² Different methods of storing data within each row. ª Positional/Fixed © Simple/common. ª Fixed with overflow © Memo/highly variable text. A 101: -Extra Large A 321: an-Premium A 532: r-Cat 23
D A T A B A S E Storing Data Columns ² Different methods of storing data within each row. ª Indexed © Fast access to columns. ª Delimited © File transfer. 24
D A T A B A S E Data Clustering and Partitioning ² Clustering ² Grouping related data together to improve performance. ª Close to each other on one disk. ª Preferably within the same disk page or cylinder. ª Minimize disk reads and seeks. ª e. g. cluster each invoice with the matching order. ² Partitioning ² Splitting tables so that infrequently used data can be placed on slower drives. ª Vertical partition © Move some columns. © e. g. , move description and comments to optical drive. ª Horizontal partition © Move some rows. © e. g. , move orders beyond 2 years old to optical drive. 25
D A T A B A S E Data Clustering ² Keeping data on same drive ² Keeping data close together ª Same cylinder ª Same I/O page ª Consecutive sectors Order #1123 Odate C# 8876 Order #1124 Odate C# 4293 Order# 1123 Item #078 Quantity 3 Order# 1123 Item #987 Quantity 1 Order# 1123 Item #240 Quantity 2 26
D A T A B A S E Data Partitioning ² Split table ª Horizontally ª Vertically ² Characteristics ª Infrequent access ª Large size ª Move to slower / cheaper storage Customer# 2234 5532 0087 0109 Name Inouye Jones Hardaway Pippen High speed hard disk Active customers Address 9978 Kahlea Dr. 887 Elm St. 112 West 2000 873 Lake Shore Low cost optical disk Phone 555 -2222 666 -777 -3333 888 -222 -1111 333 -111 -2235 27
D A T A B A S E Vertical Partition ² In one table, some columns are large and do not need to be accessed as often. ª Store primary data on high speed disk. ª Store other data on optical disk. ª DBMS retrieves both automatically as needed. ² Products table example. ª Basic inventory data. ª Detailed technical specifications and images. Item# 875 937 Name QOH Bolt 268 Injector 104 Description 1/4” x 10 Fuel injector High speed hard disk Low cost optical disk Technical. Specifications Hardened, meets standards. . . Designed 1995, specs. . . 28
D A T A B A S E Disk Striping and RAID ² Redundant Array of Independent Drives (RAID) ª Instead of one massive drive, use many smaller drives. ª Split table to store parts on different drives (striping). ª Duplicate pieces on different drive for backup. ª Drives can simultaneously retrieve portions of the data. Cust. ID 115 225 333 938 Name Jones Inez Shigeta Smith Phone 555 -1111 666 -2222 777 -1357 888 -2225 29
- Slides: 29