How Messy How Messy is Your Database Page

How. Messy How Messy is Your Database Page n How messy is your database? 2 n Hashing algorithm 5 n Interpreting master dataset lines 12 n Master dataset solutions 15 n How. Messy sample report (detail dataset) 17 n Repacking a detail dataset 22 n Detail dataset solutions 26 n Estimating response time 29 n Automating How. Messy analysis 30 n Summary 33 1

How messy is your database? n A database is messy if it takes more I/O than it should n Unnecessary I/O is still a major limiting factor even on MPE/i. X machines n Databases are messy by nature n Run How. Messy or DBLOADNG against your database n How. Messy is a bonus program for Robelle customers n DBLOADNG is a contributed library program 2

Blocks n Turbo. IMAGE does all I/O operations in blocks n A block may contain many user records n More entries per block means fewer I/Os n Fewer I/Os means better performance Block 1 Block 2 Block 12501 Capacity: 100001 1 2 3 4 5 6 7 8 User Data Blocking factor = 8 3

Record location in masters n Search item values must be unique n Location of entries is determined by a hashing algorithm or a primary address calculation n Calculation is done on search item value to transform it into a record number between one and the capacity n Different calculation depending on the search item type n X, U, Z, and P give random results n I, J, K, R, and E give predictable results 4

Hashing algorithm n Customer number AA 1000 is transformed into a record number Customer number AA 1000 Block 1 Block 3162 Record number 25299 Blocking factor = 8 Block 3162 Block 12501 Capacity: 100001 5

Hashing algorithm (no collision) n Customer number BD 2134 gives a different record number in a different block Block 1 Block 7759 AA 1000 Customer number Record number 62075 BD 2134 Blocking factor = 8 Block 7759 Block 12501 Capacity: 100001 6

Hashing algorithm (collision - same block) n Customer number CL 1717 hashes to the same record number as AA 1000 location n Turbo. IMAGE tries to find an empty location in the same block. If it finds one, no additional I/O is required. n CL 1717 becomes a secondary entry. Primary and secondary entries are linked using pointers that form a chain. Block 3162 Customer number CL 1717 25299 AA 1000 25302 7

Hashing algorithm (collision - different block) n Customer number MD 4884 collides with AA 1000 n No more room in this block. Turbo. IMAGE reads the following blocks until it finds a free record location. n In this case, MD 4884 is placed two blocks away, which requires two additional I/Os. Block 3164 Block 3162 Customer number MD 4884 25299 AA 1000 25302 CL 1717 Block 3163 is full 25315 8

An example Turbo. IMAGE database M-CUSTOMER-NO D-ORDERS A-ORDER-NO D-ORD-ITEMS 9

How. Messy sample report How. Messy/XL (Version 2. 2. 1) Turbo. IMAGE/3000 databases Data Base: STORE. DATA. INVENT By Robelle Consulting Ltd. Type Data Set M-Customer A-Order-No D-Orders D-Ord-Items Man Ato Det Capacity 248113 1266783 1000000 4000000 Search Field Customer-No Order-No !Order-No S Customer-No S !Order-No Entries 178018 768556 768558 3458511 Max Ave Chain 32 1. 92 10 1. 35 1 1. 00 80 14. 34 1604 8. 06 Load Factor 71. 7% 60. 7% 76. 9% 86. 5% Run on: MON, JAN 9, 1995, 11: 48 AM Page: 1 Secon- Max daries Blks (Highwater) 30. 5% 1496 25. 7% 1 ( 851445) ( 3470097) Blk Fact 11 70 32 23 Std Expd Avg Ineff Dev Blocks Ptrs 0. 32 1. 00 1. 90 90. 5% 0. 62 1. 00 0. 0% 0 1. 00 0. 0% 17. 76 1. 75 9. 20 57. 2% 35. 75 1. 36 11. 32 72. 5% Elongation 1. 90 1. 00 5. 25 8. 34 10

How. Messy sample report (master dataset) How. Messy/XL (Version 2. 2. 1) Turbo. IMAGE/3000 databases Data Base: STORE. DATA. INVENT By Robelle Consulting Ltd Type Data Set M-Customer A-Order-No D-Orders D-Ord-Items Man Ato Det Capacity 248113 1266783 1000000 4000000 Search Field Customer-No Order-No !Order-No S Customer-No S !Order-No Entries 178018 768556 768558 3458511 Max Ave Chain 32 1. 92 10 1. 35 1 1. 00 80 14. 34 1604 8. 06 Load Factor 71. 7% 60. 7% 76. 9% 86. 5% Run on: MON, JAN 9, 1995, 11: 48 AM Page: 1 Secon- Max daries Blks (Highwater) 30. 5% 1496 25. 7% 1 ( 851445) ( 3470097) Blk Fact 11 70 32 23 Std Expd Avg Ineff Dev Blocks Ptrs 0. 32 1. 00 1. 90 90. 5% 0. 62 1. 00 0. 0% 0 1. 00 0. 0% 17. 76 1. 75 9. 20 57. 2% 35. 75 1. 36 11. 32 72. 5% Elongation 1. 90 1. 00 5. 25 8. 34 11

Interpreting master datasets lines n Pay attention to the following statistics: n High percentage of Secondaries (inefficient hashing) n High Maximum Blocks (clustering) n High Maximum and Average Chains (inefficient hashing) n High Inefficient Pointers (when secondaries exist) n High Elongation (when secondaries exist) 12

Report on m-customer n The number of Secondaries is not unusually high n However, there may be problems n Records are clustering (high Max Blks) n Long synonym chain n High percentage of Inefficient Pointers Type Data Set M-CUSTOMER Man Capacity Entries Load Factor 248113 178018 71. 7% Search Field CUSTOMER-NO Max Chain Ave Chain 22 1. 92 Secon- Max daries Blks (Highwater) Blk Fact 30. 5% 1496 11 Std Expd Avg Dev Blocks 0. 32 1. 00 Ineff Ptrs Elongation 1. 90 90. 5% 1. 90 13

Report on a-order-no n Very tidy dataset n n Number of Secondaries is acceptable Max Blks, Ineff Ptrs and Elongation are at the minimum values, even if the maximum chain length is a bit high Secon- Max Load daries Blk Capacity Entries Factor (Highwater) Type Data Set A-ORDER-NO Ato 1266783 768556 60. 7% 25. 7% Max Ave Search Field Chain Std Expd Avg Dev Blocks ORDER-NO 0. 62 10 1. 35 1. 00 Fact 1 70 Ineff Elong. Ptrs ation 1. 00 0. 0% 1. 00 14

Master dataset solutions n Increase capacity to a higher odd number n Increase the Blocking Factor n Increase block size n Reduce record size n Change binary keys to type X, U, Z, or P n Check your database early in the design n Use How. Messy on test databases 15

How. Messy Exercise 1 Type Data Set A-MASTER Capacity Ato Search Field MASTER-KEY Load Entries Factor 14505679 9709758 Max Chain Ave Chain 37 1. 58 66. 9% Secon-Max daries Blks (Highwater) Blk Fact 36. 8% 2395 29 Std Expd Avg Dev Blocks 1. 26 1. 00 Ineff Ptrs Elongation 1. 88 48. 5% 1. 88 16

How. Messy sample report (detail dataset) How. Messy/XL (Version 2. 2. 1) Data Base: STORE. DATA. INVENT for Turbo. IMAGE/3000 databases By Robelle Consulting Ltd. Type Data Set M-CUSTOMER A-ORDER-NO D-ORDERS D-ORD-ITEMS Man Ato Det Search Field Customer-No Order-No !Order-No S Customer-No S !Order-No Capacity 248113 126673 1000000 4000000 Max Chain 22 10 1 80 1604 Run on: MON, JAN 9, 1995, 11: 48 AM Page: 1 Secon- Max Load daries Blk Entries Factor (Highwater) 178018 71. 7% 30. 5% 1496 768556 60. 7% 25. 7% 1 768556 76. 9% ( 851445) 3458511 86. 5% ( 3470097) Ave Std Expd Chain Dev Blocks 1. 92 0. 32 1. 00 1. 35 0. 62 1. 00 0 1. 00 14. 34 17. 76 1. 75 8. 06 35. 75 1. 36 Avg Blocks 1. 90 1. 00 9. 20 11. 32 Fact 1 70 12 23 Ineff Ptrs 90. 5% 0. 0% 57. 2% 72. 5% Elongation 1. 90 1. 00 5. 25 8. 34 17

Empty detail dataset n Records are stored in the order they are created starting from record 1 n Records for the same customer are linked together using pointers to form a chain n Chains are linked to the corresponding master entry D-ORD-HEADER Customer Block 12500 Capacity: 100000 1 2 3 4 5 6 7 8 Order AA 1000 O 000001 MD 4884 O 000002 BD 2134 O 000003 MD 4884 O 000004 CL 1717 O 000005 AA 1000 O 000006 Blocking factor = 8 18

Detail chains get scattered n Over time, records for the same customer are scattered over multiple blocks Block 1 1 AA 1000 Block 23 Block 10 O 000001 74 AA 1000 O 000221 180 6 AA 1000 O 000476 O 000006 80 AA 1000 O 000252 19

Delete chain n Deleted records are linked together n Turbo. IMAGE reuses the records in the Delete chain, if there any Block 34 Block 16 128 265 Deleted 268 Deleted 20

Highwater mark n Indicates highest record location used so far n Serial reads scan the dataset up to the highwater mark D-ORD-HEADER Block 1 Used blocks : some empty, some partially used, some full Block 8000 highwater mark Block 12500 21

Repacking a detail dataset n Groups records along primary path n Removes Delete chain (no holes) n Block 1 Resets highwater mark 1 2 3 4 5 6 7 8 Block 1 Block 4500 AA 1000 O 000001 AA 1000 O 000006 AA 1000 O 000221 AA 1000 O 000252 AA 1000 O 000476 BD 2137 O 000003 CL 1717 O 000005 MD 4884 O 000004 highwater mark Block 12500 22

Interpreting detail dataset lines n Pay attention to the following statistics: n Load Factor approaching 100% (dataset full) n Primary path (large Average Chain and often accessed) n n n High Average Chain and low Standard deviation, especially with a sorted path (Is path really needed? ) High Inefficient Pointers (entries in chain not consecutive) High Elongation (entries in chain not consecutive) 23

Report on d-orders n Primary path should be on customer-no, not on order-no n Highwater mark is high n Repack along new primary path regularly Capacity Load Entries Factor Secon-Max daries Blks (Highwater) 1000000 768556 ( Type Data Set D-ORDERS Det Search Field Max Chain !ORDER-NO S CUSTOMER-NO 1 80 Ave Chain 76. 9% 851445) Std Expd Avg Dev Blocks 1. 00 0 14. 34 17. 76 1. 00 1. 75 Blk Fact 12 Ineff Ptrs Elongation 1. 00 0. 0% 9. 20 57. 2% 1. 00 5. 25 24

Report on d-ord-items n Inefficient Pointers and Elongation are high n Highwater mark is fairly high n Repack the dataset regularly n Is the sorted path really needed? Type Data Set D-ORD-ITEMS Det Search Field S !ORDER-NO Capacity Entries 4000000 3458511 Max Ave Chain 1604 8. 06 Secon- Max Load daries Blks Factor (Highwater) 86. 5% ( 3470097) Std Expd Avg Dev Blocks 35. 75 1. 36 11. 32 Blk Fact 23 Ineff Ptrs Elongation 72. 5 8. 34 25

Detail dataset solutions n Assign the primary path correctly; search item with Average Chain length > 1 that is accessed most often n Repack datasets along the primary path regularly n Increase the Blocking Factor n Increase block size n Reduce record size n Understand sorted paths n Check your databases early in the design; use How. Messy on test databases 26

How. Messy Exercise 2 Type Data Set D-ITEMS Capacity Det 620571 Load Entries Factor Secon-Max daries Blks (Highwater) 119213 ( Max Search Field Chain Ave Chain 19. 2% Std Dev 242025) Expd Blocks Blk Fact 7 Avg Blocks Ineff Elong. Ptrs ation S ! ITEM-NO 3 1. 00 0. 02 1. 00 0. 0% S SUPPLIER-NO 23 8. 07 3. 25 1. 77 3. 30 28. 4% LOCATION 5938 11. 62 63. 64 2. 24 2. 53 13. 2% BO-STATUS 99999. 99 0. 00 17031. 00 17047. 00 14. 3% DISCOUNT 99999 120. 18 1337. 15 3. 73 39. 37 31. 9% 1. 00 1. 86 1. 13 1. 00 10. 55 27

Minimum number of disc I/Os Intrinsic Disc I/Os DBGET DBFIND DBBEGIN DBEND DBUPDATE DBPUT DBDELETE 1 1 1 (non-critical item) 13 (critical item) 3 [+ (4 x #paths, if detail)] 2 [+ (4 x #paths, if detail)] Serial reads: Master Detail Capacity / Blocking factor # entries / Blocking factor 28

Estimating response time n n n Deleting 100, 000 records from a detail dataset with two paths would take: n 2 + (4 x 2 paths) = 10 I/Os per record n 100, 000 records x 10 I/Os per record = 1, 000 I/Os Classic: around 25 I/Os per second n 1, 000 I/Os / 25 = 40, 000 seconds n 40, 000 seconds / 3600 = 11. 1 hours i. X: around 40 I/Os per second n 1, 000 I/Os / 40 = 25, 000 seconds n 25, 000 seconds / 3600 = 6. 9 hours 29

Automating How. Messy analysis n Recent version of How. Messy creates a self-describing file with these statistics n Process the file with generic tools (Suprtool, Ask. Plus) or custom programs (COBOL, 4 GL), and produce custom reports n Send messages to database administrators n Write “smart” job to fix databases without user intervention 30

Processing Loadfile with Suprtool n Datasets more than 80% full >input loadfile >if loadfactor > 80 >ext database, datasettype, loadfactor >list standard n Only one address per customer >input loadfile >if dataset = "D-ADDRESSES" and & maxchain > 1 31

References n The Turbo. IMAGE/3000 Handbook (Chapter 23) n Available for $ 49. 95 from: WORDWARE P. O. Box 14300 Seattle, WA 98114 32

Summary n Turbo. IMAGE databases become messy over time, especially if they are active n How. Messy and DBLOADNG let you analyze the database’s efficiency n You should have some knowledge of the internal workings of Turbo. IMAGE n Monitor your databases regularly 33

34