Common Database Problems Common Database Solutions Mike Furgal
Common Database Problems Common Database Solutions Mike Furgal PROGRESS Bravepoint – Database Services
Introduction Bravepoint Mike Furgal § Progress Employee since 1989 § Developer of the Open. Edge database § Joined Bravepoint in 2012 § Heads up Database Services • Including Managed Database Services § Largest Progress/Open. Edge consulting firm § § § Founded in 1987 Purchased by Progress in April 2014 Specializes in all things Open. Edge • • • § Programming QAD Pro 2 SQL • 2 Database Services Real-time Replication to SQL Target
A series of case studies of issue that the PROGRESS Brave. Point Managed Database Services Team has encountered over the years. 3
The case of the Missing Files 4
A large distribution center had a power failure. When the power came back on the machine booted but the database did not start : (43) ** Cannot find or open file /agility/prod_db/platte_11. d 5, errno = 2. : (451) prostrct list session begin for root on /dev/pts/0. : (12475) Unable to get file status for extent /agility/prod_db/platte_11. d 5 : (334) prostrct list session end. 5
Specifics § § 6 Database was 80 GB Last good backup was 1 week old Not running After Imaging Platform was Linux
WHAT WOULD YOU DO? 7
Approach § Made a copy of the existing database incase we made a mistake § Used PROSTRCT LIST to determine which files were missing • We were lucky that the missing file was part of a storage area that only held indexes § Tools Available • PROSTRCT UNLOCK • PROSTRCT BUILDDB 8
Solution § Restored the missing extent from the week old backup and ran PROSTRCT UNLOCK § Rebuilt the indexes • # proutil db –C idxbuild all BUT……. § Index rebuild failed due to finding back blocks in the storage area where the records were stored 9
NOW WHAT? 10
Back to the Beginning § Copied the backed up database to start over. • Since Index Rebuild failed, we needed to start over • Good thing we had copied all the files in the first place § Add the missing extent § Truncate the BI and do a DBRPR scan • Fix bad blocks • Fix bad records 11
Dump and Load § After all the corruption was removed it was time to dump and load § Need to do an ASCII Dump to dump around some bad records 12
Lessons Learned § This Database was important to this customer, hence they wanted it back when it got corrupted. § They need to treat the Database better • Daily Backups • After Imaging § A good DR plan saves a lot of heartache 13
Next Steps § Implement a good Disaster Recover plan which includes • Frequent backups • After Imaging implemented § Test the Disaster Recover Plan • Annually § Disaster Recover Plan needs to be on Paper • Can’t be just on the computer • Need a backup plan incase the DR plan fails 14
The case of the Micro Manager 15
A brand name US bank had SAN corruption. This prevented Crash Recovery from completing. They had a Hot Standby machine and database using OE Replication. 16
Specifics § Had a local backup and local AI files, but the backup would not restore § Previous backup was not available § Replica Database was up to date § Platform was Windows § Database size 200 GB § Open. Edge 10. 1 C 04 17
What’s the Problem § Customer refused to fail-over • They never tested running on the fail-over machine. Had little confidence that the application would run in the fail-over environment. • Customer worried about the time it takes to fail-back once failed over. 18
Making Matters Worse § Copying the DR database to the production machine is measured in days § Options presented to Management included FORCED ACCESS to the Database 19
20
What Next § Forced into the database – This skips Crash Recovery § Index Rebuild DOES NOT fix the database § Dump and Load DOES NOT fix the database 21
Lesson Learned § Have confidence in your Disaster Recovery Plan • There is no sense of having one if you are never going to use it § Be Careful of the “QUICK FIX” • Non-technical people will ALWAYS choose the fastest approach to the solution without understanding the consequences 22
Next Steps § Worked with the customer to do a fail-over test. § Made the fail-over testing an annual event 23
Schools Out For Summer 24
A Large school district needs to get their reports cards out to 30, 000+ students. They discovered they had corruption in the database because backups stopped working for about a week 25
Specifics § 10. 2 B 05 Windows 64 bit Open. Edge § Last good backup is 1 week old § All report card data for 30, 000+ students entered since that last good backup § After Imaging is turned on, but AI file retention was less than 1 week § Database is about 300 GB § They have the 1 week old backup restored to a different location 26
WHAT WOULD YOU DO? 27
Approach § We had 2 plans § Plan A – Get the corruption out of the live database • Use any and all tools to remove the corruption § Plan B – Revert back to the week old database • See if we can take all the report card data from the live database and import it into the week old database. 28
Plan A § The database. lg file showed the extents where the corruption was located. § Each storage area was a single variable length extent § Corruption was in an 80 GB extent (Ugh!) • Used DBRPR to scan and fix bad blocks • This took hours to run on this large extent • In the end this failed 29
Plan B § Worked with the vendor to find all the tables that made up the report card processing • This was about 12 tables § Dumped these tables from the live database • There was no corruption in these tables § Had to figure out how to get the table data into the week old database 30
HMMMMM…. . 31
Plan B § Dumped the schema for the 12 tables § Went into the dictionary and renamed the tables • Added _old to the end of the table name § Loaded the schema for the 12 tables § Loaded the data for the 12 tables § This is a very useful trick • Didn’t need to recompile – the application worked 32
Plan A (revisited) § Dumped and Loaded the plan A database § There were 5 tables where the dump and load failed. § Did a 4 GL dump • FOR EACH … BY field. EXPORT… • FOR EACH … BY field DESCENDING. EXPORT … § Didn’t trust the data, so we use the same table rename technique to get these tables from the week old backup. 33
34
But Wait – There’s More § A week later they found they also had corruption in a different database • That was solved by restore and roll forward • Needed to upgrade to 10. 2 B 08 for Roll Forward to work properly – Windows 64 bit 10. 2 B 06 has a roll forward bug that prevented it from working. 35
Next Steps § Implement a DR solution • Open. Edge Replication • Rolling Forward AI § Restore the backup and roll forward on the same machine • This verifies the backup is functional • DB block corruption does not get replicated from roll forward 36
A case of the spins 37
A large medical center patched their software over the weekend. On Monday the performance of the system was unacceptable. The vendor says the patch was minor and could not be the cause of the issue. The customer says nothing else changed. 38
Specifics § § 39 Open. Edge 11. 2. 0 32 bit Windows 2012 64 bit Database is 321 GB Number of users is 3, 000
Some Metrics – Month View Date CPs Users 05/16/15 (Sat) 86 05/15/15 (Fri) 261 05/14/15 (Thu) DB Requests DB Reads 580 7, 948, 358, 747 149, 508, 775 323 Ratio Rec Rds Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 175, 980 172, 163 116, 490 14, 798 2, 792 12, 987, 557, 626 114, 936, 649 113 2, 465, 462, 910 1, 520, 296 3, 227, 568 1, 626, 692 1, 886, 768 449, 449 295, 333 115, 003 263 3, 011 11, 000, 344, 090 56, 940, 273 193 2, 090, 165, 002 1, 639, 564 3, 475, 092 454, 097 298, 088 73, 017 05/13/15 (Wed) 323 3, 126 10, 371, 051, 213 55, 142, 493 871, 808 2, 023, 720 1, 901, 654, 803 188 1, 879, 551, 662 2, 250, 168 3, 423, 760 1, 099, 930 2, 378, 306 525, 070 374, 006 885, 294 05/12/15 (Tue) 279 3, 089 10, 567, 333, 668 140, 530, 655 75 1, 901, 654, 803 1, 797, 520 3, 397, 238 1, 043, 849 2, 068, 487 496, 165 328, 450 943, 510 279 05/11/15 (Mon) Restart 53 1, 072, 918, 966 Rec Cr Restart 425, 542 1, 922, 412 374, 270 1, 879, 551, 662 Restart Restart 05/10/15 (Sun) 73 385 10, 806, 473, 996 206, 617, 341 52 2, 307, 235, 660 307, 377 1, 804, 694 368, 589 100, 764 150, 257 102, 579 244, 087 05/09/15 (Sat) 88 504 5, 704, 394, 389 82, 411, 186 69 693, 023, 191 483, 479 1, 423, 644 516, 069 171, 617 165, 926 115, 092 186, 064 05/08/15 (Fri) Restart 293 Restart 05/07/15 (Thu) 271 2, 940 10, 046, 740, 997 145, 481, 723 05/06/15 (Wed) 338 2, 989 9, 830, 327, 570 153, 056, 406 05/05/15 (Tue) 293 05/04/15 (Mon) Restart Restart 1, 593, 242, 356 Restart 306, 003 128, 058 64 1, 442, 212, 561 2, 247, 914 3, 525, 546 1, 225, 826 2, 453, 942 557, 639 374, 309 129, 160 2, 967 10, 392, 149, 949 154, 806, 221 67 1, 593, 242, 356 2, 000, 392 3, 366, 955 1, 126, 177 2, 324, 533 488, 949 329, 102 171, 067 488 2, 971 10, 483, 718, 093 162, 479, 487 65 1, 547, 975, 267 2, 311, 179 3, 733, 307 1, 363, 409 2, 678, 057 712, 951 528, 518 212, 059 05/03/15 (Sun) 125 484 11, 161, 696, 099 217, 504, 812 51 1, 884, 717, 953 331, 006 1, 783, 868 1, 243, 395 222, 981 156, 128 23, 917 05/02/15 (Sat) 1, 433 540 8, 114, 391, 833 164, 345, 414 49 735, 325, 461 444, 568 1, 853, 376 24, 483, 171 3, 078, 655 1, 889, 151 1, 496, 027 132, 360 05/01/15 (Fri) 374 2, 735 11, 611, 724, 202 126, 877, 164 40 924, 082 2, 153, 671 Restart 455, 228 488 69 1, 596, 756, 358 1, 705, 661 3, 503, 669 Restart 1, 547, 975, 267 270, 902 92 1, 815, 943, 455 2, 450, 987 3, 063, 577 1, 458, 166 2, 046, 184 590, 195 411, 705 3, 268, 221
Bad Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 02 44 10: 50: 01 45 11: 05: 01 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 CPs Users DB Requests DB Reads 2 2 2 3 7 5 4 5 3, 033 106, 609, 219 36, 595, 922 3, 052 102, 685, 946 27, 744, 828 3, 082 97, 655, 645 1, 303, 656 3, 089 3, 086 3, 039 3, 040 2, 888 81, 674, 392 214, 447, 121 155, 915, 767 156, 151, 501 146, 245, 414 1, 293, 722 1, 716, 183 1, 492, 150 1, 434, 288 1, 666, 633 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 3 11, 840, 580 9, 664 31, 257 2, 338 4 12, 858, 412 10, 923 33, 024 2, 407 75 17, 250, 593 13, 814 38, 947 3, 318 63 14, 030, 509 22, 289 36, 321 125 61, 973, 396 40, 122 58, 595 104 34, 202, 285 25, 758 57, 494 109 34, 103, 824 27, 304 60, 045 88 33, 019, 801 33, 605 60, 463 5, 409 29, 276 14, 197 7, 791 11, 379 BI Writes AI Writes Latch TO 19, 189 4, 858 4, 499 88, 929 21, 026 5, 858 4, 660 99, 365 27, 611 7, 075 5, 046 107, 799 25, 428 59, 987 48, 054 48, 323 52, 606 7, 604 13, 919 8, 778 8, 285 8, 566 5, 516 8, 509 4, 993 4, 571 5, 226 100, 920 30, 332 4, 498 3, 952 2, 711 Good Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 01 44 10: 50: 01 45 11: 05: 02 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 41 CPs Users DB Requests DB Reads 4 5 4 7 4 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 2, 848 153, 746, 951 2, 343, 049 66 28, 340, 783 30, 489 60, 805 2, 812 145, 441, 871 1, 755, 123 83 27, 387, 070 26, 490 59, 676 2, 877 151, 783, 516 1, 876, 013 81 27, 653, 297 30, 446 61, 754 2, 894 143, 780, 080 1, 877, 651 77 29, 215, 543 46, 234 63, 980 1, 808, 835 2, 259, 787 2, 195, 738 1, 774, 289 88 30, 191, 428 34, 806 63, 070 69 27, 841, 346 34, 727 60, 416 69 25, 976, 744 70, 239 83, 419 78 23, 570, 981 31, 236 59, 286 2, 912 6 2, 897 8 2, 938 4 2, 914 158, 495, 087 155, 845, 110 150, 662, 822 138, 147, 804 7, 666 13, 279 11, 262 19, 392 12, 040 12, 770 20, 193 12, 957 BI Writes AI Writes Latch TO 55, 818 7, 757 4, 774 5, 928 53, 195 6, 642 4, 962 3, 637 54, 906 7, 899 5, 192 6, 777 66, 429 11, 820 6, 774 7, 185 59, 041 59, 998 82, 777 57, 756 9, 215 8, 929 12, 552 7, 696 5, 284 5, 497 8, 542 4, 971 10, 165 7, 237 6, 035 2, 731
Bad Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 02 44 10: 50: 01 45 11: 05: 01 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 CPs Users DB Requests DB Reads 2 2 2 3 7 5 4 5 3, 033 106, 609, 219 36, 595, 922 3, 052 102, 685, 946 27, 744, 828 3, 082 97, 655, 645 1, 303, 656 3, 089 3, 086 3, 039 3, 040 2, 888 81, 674, 392 214, 447, 121 155, 915, 767 156, 151, 501 146, 245, 414 1, 293, 722 1, 716, 183 1, 492, 150 1, 434, 288 1, 666, 633 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 3 11, 840, 580 9, 664 31, 257 2, 338 4 12, 858, 412 10, 923 33, 024 2, 407 75 17, 250, 593 13, 814 38, 947 3, 318 63 14, 030, 509 22, 289 36, 321 125 61, 973, 396 40, 122 58, 595 104 34, 202, 285 25, 758 57, 494 109 34, 103, 824 27, 304 60, 045 88 33, 019, 801 33, 605 60, 463 5, 409 29, 276 14, 197 7, 791 11, 379 BI Writes AI Writes 19, 189 4, 858 4, 499 21, 026 5, 858 4, 660 27, 611 7, 075 5, 046 25, 428 59, 987 48, 054 48, 323 52, 606 7, 604 13, 919 8, 778 8, 285 8, 566 5, 516 8, 509 4, 993 4, 571 5, 226 Latch TO 88, 929 99, 365 107, 799 100, 920 30, 332 4, 498 3, 952 2, 711 Good Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 01 44 10: 50: 01 45 11: 05: 02 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 42 CPs Users DB Requests DB Reads 4 5 4 7 4 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 2, 848 153, 746, 951 2, 343, 049 66 28, 340, 783 30, 489 60, 805 2, 812 145, 441, 871 1, 755, 123 83 27, 387, 070 26, 490 59, 676 2, 877 151, 783, 516 1, 876, 013 81 27, 653, 297 30, 446 61, 754 2, 894 143, 780, 080 1, 877, 651 77 29, 215, 543 46, 234 63, 980 1, 808, 835 2, 259, 787 2, 195, 738 1, 774, 289 88 30, 191, 428 34, 806 63, 070 69 27, 841, 346 34, 727 60, 416 69 25, 976, 744 70, 239 83, 419 78 23, 570, 981 31, 236 59, 286 2, 912 6 2, 897 8 2, 938 4 2, 914 158, 495, 087 155, 845, 110 150, 662, 822 138, 147, 804 7, 666 13, 279 11, 262 19, 392 12, 040 12, 770 20, 193 12, 957 BI Writes AI Writes 55, 818 7, 757 4, 774 53, 195 6, 642 4, 962 54, 906 7, 899 5, 192 66, 429 11, 820 6, 774 59, 041 59, 998 82, 777 57, 756 9, 215 8, 929 12, 552 7, 696 5, 284 5, 497 8, 542 4, 971 Latch TO 5, 928 3, 637 6, 777 7, 185 10, 165 7, 237 6, 035 2, 731
Some Metrics Date CPs Users DB Requests DB Reads Ratio Rec Rds Rec Up 05/15/15 (Fri) 261 2, 792 12, 987, 557, 626 114, 936, 649 113 2, 465, 462, 910 1, 520, 296 3, 227, 568 1, 626, 692 1, 886, 768 05/14/15 (Thu) 263 3, 011 11, 000, 344, 090 56, 940, 273 193 2, 090, 165, 002 1, 639, 564 3, 475, 092 05/13/15 (Wed) 323 3, 126 10, 371, 051, 213 55, 142, 493 188 1, 879, 551, 662 2, 250, 168 3, 423, 760 1, 099, 930 2, 378, 306 525, 070 374, 006 885, 294 05/12/15 (Tue) 279 3, 089 10, 567, 333, 668 140, 530, 655 75 1, 901, 654, 803 1, 797, 520 3, 397, 238 1, 043, 849 2, 068, 487 496, 165 328, 450 943, 510 Restart 175, 980 172, 163 116, 490 885, 294 580 7, 948, 358, 747 149, 508, 775 Restart 374, 270 DB Writes BI Writes AI Writes Latch TO 86 Restart 425, 542 1, 922, 412 Rec Del 05/16/15 (Sat) 05/11/15 (Mon) Restart 53 1, 072, 918, 966 Rec Cr 871, 808 2, 023, 720 Restart 73 385 10, 806, 473, 996 206, 617, 341 52 2, 307, 235, 660 307, 377 1, 804, 694 368, 589 100, 764 05/09/15 (Sat) 88 504 5, 704, 394, 389 82, 411, 186 69 693, 023, 191 483, 479 1, 423, 644 516, 069 171, 617 Restart Restart Restart 295, 333 454, 097 298, 088 943, 510 Restart 05/10/15 (Sun) 05/08/15 (Fri) 449, 449 150, 257 Restart 14, 798 115, 003 73, 017 Restart 102, 579 244, 087 165, 926 115, 092 128, 058 186, 064 Restart 05/07/15 (Thu) 271 2, 940 10, 046, 740, 997 145, 481, 723 69 1, 596, 756, 358 1, 705, 661 3, 503, 669 05/06/15 (Wed) 338 2, 989 9, 830, 327, 570 153, 056, 406 64 1, 442, 212, 561 2, 247, 914 3, 525, 546 1, 225, 826 2, 453, 942 557, 639 374, 309 129, 160 05/05/15 (Tue) 293 2, 967 10, 392, 149, 949 154, 806, 221 67 1, 593, 242, 356 2, 000, 392 3, 366, 955 1, 126, 177 2, 324, 533 488, 949 329, 102 171, 067 05/04/15 (Mon) 488 2, 971 10, 483, 718, 093 162, 479, 487 65 1, 547, 975, 267 2, 311, 179 3, 733, 307 1, 363, 409 2, 678, 057 712, 951 528, 518 212, 059 05/03/15 (Sun) 125 484 11, 161, 696, 099 217, 504, 812 51 1, 884, 717, 953 331, 006 1, 783, 868 1, 243, 395 23, 917 05/02/15 (Sat) 1, 433 540 8, 114, 391, 833 164, 345, 414 49 735, 325, 461 222, 981 156, 128 3, 268, 211 444, 568 1, 853, 376 24, 483, 171 3, 078, 655 1, 889, 151 1, 496, 027 05/01/15 (Fri) 374 2, 735 11, 611, 724, 202 126, 877, 164 43 924, 082 2, 153, 671 455, 228 306, 003 129, 160 Restart 270, 902 92 1, 815, 943, 455 2, 450, 987 3, 063, 577 1, 458, 166 2, 046, 184 590, 195 411, 705 128, 058 132, 360 3, 268, 221
Latch Timeouts increased. CRUD Operations Decreased. Why? Nothing had changed 44
§ Further investigation revealed that the –spin setting was changed from 96, 000 to 20, 000. • This change was a move to best practices where industry experts have been saying to not have –spin higher than 20, 000 45
§ The change was made months back to the conmgr. properties file and was long forgotten. § When the patch was applied, the database was bounced and the change finally took affect § While no one remembers a configuration change, the change was there § Setting –spin back up to 96, 000 got them the performance back 46
Bad Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 02 44 10: 50: 01 45 11: 05: 01 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 CPs Users DB Requests DB Reads 2 2 2 3 7 5 4 5 3, 033 106, 609, 219 36, 595, 922 3, 052 102, 685, 946 27, 744, 828 3, 082 97, 655, 645 1, 303, 656 3, 089 3, 086 3, 039 3, 040 2, 888 81, 674, 392 214, 447, 121 155, 915, 767 156, 151, 501 146, 245, 414 1, 293, 722 1, 716, 183 1, 492, 150 1, 434, 288 1, 666, 633 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 3 11, 840, 580 9, 664 31, 257 2, 338 4 12, 858, 412 10, 923 33, 024 2, 407 75 17, 250, 593 13, 814 38, 947 3, 318 63 125 104 109 88 14, 030, 509 61, 973, 396 34, 202, 285 34, 103, 824 33, 019, 801 22, 289 40, 122 25, 758 27, 304 33, 605 36, 321 5, 409 58, 595 29, 276 57, 494 14, 197 60, 045 7, 791 60, 463 11, 379 BI Writes AI Writes 19, 189 4, 858 4, 499 21, 026 5, 858 4, 660 27, 611 7, 075 5, 046 25, 428 59, 987 48, 054 48, 323 52, 606 7, 604 13, 919 8, 778 8, 285 8, 566 5, 516 8, 509 4, 993 4, 571 5, 226 Latch TO 88, 929 99, 365 107, 799 100, 920 30, 332 4, 498 3, 952 2, 711 Good Day – 15 minute samples Sample 41 10: 05: 01 42 10: 20: 01 43 10: 35: 01 44 10: 50: 01 45 11: 05: 02 46 11: 20: 01 47 11: 35: 01 48 11: 50: 01 47 CPs Users DB Requests DB Reads 4 5 4 7 4 Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes 2, 848 153, 746, 951 2, 343, 049 66 28, 340, 783 30, 489 60, 805 2, 812 145, 441, 871 1, 755, 123 83 27, 387, 070 26, 490 59, 676 2, 877 151, 783, 516 1, 876, 013 81 27, 653, 297 30, 446 61, 754 2, 894 143, 780, 080 1, 877, 651 77 29, 215, 543 46, 234 63, 980 1, 808, 835 2, 259, 787 2, 195, 738 1, 774, 289 88 30, 191, 428 34, 806 63, 070 69 27, 841, 346 34, 727 60, 416 69 25, 976, 744 70, 239 83, 419 78 23, 570, 981 31, 236 59, 286 2, 912 6 2, 897 8 2, 938 4 2, 914 158, 495, 087 155, 845, 110 150, 662, 822 138, 147, 804 7, 666 13, 279 11, 262 19, 392 12, 040 12, 770 20, 193 12, 957 BI Writes AI Writes 55, 818 7, 757 4, 774 53, 195 6, 642 4, 962 54, 906 7, 899 5, 192 66, 429 11, 820 6, 774 59, 041 59, 998 82, 777 57, 756 9, 215 8, 929 12, 552 7, 696 5, 284 5, 497 8, 542 4, 971 Latch TO 5, 928 3, 637 6, 777 7, 185 10, 165 7, 237 6, 035 2, 731
But WAIT! There’s more 48
§ A different customer added a few CPUs to their environment. § When the users login, the CPUs peg to 100% utilized § Performance suffers § Web. Speed launches additional Agents • Due to all agents are busy Specifics § Customer database is > 1 TB § 430 Webspeed agents § AIX § 10. 1 C 64 bit 49
Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes 171 14: 10: 00 0 433 226, 177, 323 38, 138 5, 930 92, 799, 768 30, 190 6, 316 0 4, 999 3, 311 1, 859 172 14: 15: 00 1 434 230, 752, 781 45, 463 5, 076 95, 312, 150 15, 075 5, 079 0 5, 253 1, 218 1, 022 173 14: 20: 00 1 432 227, 272, 169 47, 269 4, 808 90, 336, 327 21, 173 5, 439 81 4, 904 2, 366 1, 409 174 14: 25: 00 0 433 174, 847, 554 50, 871 3, 437 68, 054, 671 14, 028 5, 468 0 5, 425 1, 797 976 175 14: 30: 00 0 433 201, 167, 309 53, 661 3, 749 72, 196, 316 15, 100 5, 109 0 6, 304 1, 808 1, 032 176 14: 35: 00 1 433 272, 198, 783 69, 170 3, 935 104, 501, 989 25, 086 7, 389 4 5, 431 2, 913 1, 597 177 14: 40: 00 1 434 261, 870, 509 100, 191 2, 614 97, 340, 387 23, 871 5, 737 42 7, 930 1, 784 1, 504 178 14: 45: 01 0 434 264, 460, 116 391, 712 675 103, 827, 717 19, 668 5, 931 46 8, 671 1, 694 1, 284 179 14: 50: 00 1 434 249, 536, 969 779, 568 320 88, 726, 157 23, 008 5, 444 0 7, 872 2, 543 1, 453 180 14: 55: 00 1 435 279, 690, 881 155, 333 1, 801 108, 846, 566 22, 640 7, 233 24 6, 050 2, 668 1, 470 181 15: 00 0 433 268, 670, 791 539, 202 498 104, 316, 852 24, 230 7, 554 0 6, 147 2, 677 1, 557 182 15: 00 1 435 259, 585, 414 161, 012 1, 612 107, 194, 651 21, 828 6, 552 63 7, 095 1, 739 1, 375 183 15: 10: 00 1 435 245, 056, 072 316, 736 5, 862 0 5, 973 2, 401 1, 522 50 774 101, 424, 285 25, 343 Latch TO 16, 058 15, 475 21, 450 8, 608 20, 395 48, 935 58, 078 93, 499 123, 551 72, 701 64, 974 38, 665 28, 853
Unlike the previous example, we had no historical performance metrics to compare to when thing were good. Could only rely on instincts and experience. 51
A Different View In a 5 minute sample, the highest latch timeout should be no more than 3, 000 52
Changed –spin from 60, 000 to 20, 00 and the problem went away 53
Lesson Learned § There is no one setting that will work for every situation • Changing –spin from 20, 000 to 96, 000 helped one customer • Changing –spin from 60, 000 to 20, 000 helped another one § Having historical data is key § Don’t assume nothing has changed just because they said so • Configuration changes usually only take affect at next startup 54
Summary § These are examples of some real world Database Problems § Don’t assume things can’t go wrong § Having a plan is not going enough • Testing the plan and having confidence is required § If all else fails, seek professional help 55
Gus B Mike F Dan F Chris R Roadies: Paul Coveney, Darren Rhoads, Tom Cattigan, Joe Rozenberg Jeff Keller, Marek Bujnarowski, Ajit Deodhar Groupies: Dave Eddy, Humphrey Koraag, Diego Canziani, Kim Davies 56
- Slides: 57