Database DumpLoad Ben Holmes Eaton Corp 2002 Eaton
Database Dump/Load Ben Holmes Eaton Corp © 2002 Eaton Corporation. All rights reserved.
About Ben q Currently with Eaton Corp FPG division as a Sr. Progress DBA for the past 11 years q Started Programming with Progress in 1989, version 4. 2 q Progress Consultant/DBA for over 12 years. q Worked at QAD as Technical Service Manager. q Currently administrate over 80 production environments (4 -5 db’s each, over 320) Sizes range from 20 GB to 480 GB q 2
Who are You? 3 q Progress Version q Largest Database q Database Operating System
Reasons NOT to Dump & Load q Because it is X months since the last time. q Because the vendor said we should. q Because we must in order to upgrade Progress. q Because we are told it will improve performance. q When’s the next 3 day weekend? q When is the best time based upon various statistics § Scatter Factor § Fragmentation Factor § Application and/or Utility Performance 4
Why I would Dump/Load q Improve Performance by reducing Table Scatter Factor & Record Fragmentation q Reclaim Space in the Database q q Migrate between different Progress Versions (e. g. from V 9 to V 10 with new storage area) Migrate between different Platforms (e. g. Windows to HP/UX) q When is the best time based upon various statistics § Scatter Factor § Fragmentation Factor § Application and/or Utility Performance 5
When you need to D&L? 6 q Change the Database Blocksize (V 8. 2+) q Change the Records per Block (RPB) q Convert to Multiple Storage Areas (V 9+) q Convert to OE 10 Type 2 Storage Areas q Recover Data from a Corrupted Database
Physical Scatter Factor 7 q How close are the Records in a Table to each other physically q Obtained from proutil dbanalys/tabanalys q Progress recommendations are way low for most production databases; you will spend more weekends doing D&Ls then seeing your family
Scatter Factor q For Large, Frequently Accessed Tables: §. 9 Data overlap problem (bug) § 1. 0 Green Flag - Perfect § 1. 1 -2. 0 Probably OK § 2. 1 -3. 0 Yellow Flag - Deteriorating § 3. 1 -4. 0 Performance problems are probably § 4. 1+ 8 being experienced Red Flag - take action ASAP
Example RECORD BLOCK SUMMARY FOR AREA "GLTRHIST" : 35 ----------------------------Record Size (B)- ---Fragments--- Scatter Table Records Size Min Max Mean Count Factor PUB. gltr_hist 118323698 24. 4 G 127 241 221 198899694 1. 0 PUB. absc_det 477 19. 1 K 41 41 41 477 1. 0 PUB. absr_det 89324 10. 1 M 97 195 118 89324 1. 0 PUB. acd_det 133684 9. 2 M 48 91 72 133684 1. 0 PUB. anl_det 1664 99. 5 K 35 79 61 1664 1. 0 PUB. ans_det 2224 127. 3 K 44 76 58 2224 1. 0 PUB. an_mstr 1991 126. 9 K 42 82 65 1991 1. 0 9 Factor 3. 9 6. 2 4. 0 3. 8 5. 7 5. 5 5. 6
Fragmentation Factor q q 10 A Fragment is a Record A Record can consist of one or more Fragments The more Fragments per Record the more I/O is required to read and update a Record VSTs (_Act. Record) can tell you how much Excess Fragment I/O is actually occurring but not by table; also the statistics are somewhat suspicious
General Setup q Archive & Purge as much data as possible § The D&L will take less time § The Scattering will be eliminated by the D&L q Perform an Index Rebuild (if possible) § Might improve performance enough to avoid a D&L but unlikely in my experience § Might reveal hidden DB corruption that needs to be dealt with first (i. e. 1124 errors) § Should make the dump process go faster 11
Dump/Load Setup q Run & Time some Heavy Duty Reports (multi-table, many records, reproducible) q Run & Time proutil dbanalys § Use Record Counts before the Dump and after the Load to Validate that the Load was successful § Might Reveal Corruption (i. e. 1124 errors) 12
Disk Preparation q Consider Disabling Disk Mirroring § Extra Dump Space § Availability of 2 nd Disk Controller § Potential (usually minor) Performance Gain § But don’t forget there may be overhead of resyncing the mirrors q Dump to non-database disks if possible § AI/BI/Temp File Disks are good choices 13
Dump/Load Options q “Classic” Dictionary Dump & Load q Bulk Loader q Binary Dump & Load q Parallel – Multi § Many Benefits § Difficult to Effectively Balance 14 q Automating the process q Buffer COPY
Classic / 4 GL Dump q Progress Dictionary/Custom 4 GL Code § Dictionary has a Simple Interface § Usually the Slowest Method § Dictionary Dump/Load Programs can be run noninteractively § Custom Coding requires 4 GL experience § Dump files are subject to 2 GB limit but can code around it 15
Binary Dump q Binary Dump (V 8. 2 and later) § Documented & Supported § proutil command can only Dump/Load one table at a time (one proutil per Table) § Multi-Threading - parallel dump are possible § Dump files are portable § No 2 GB limit (. b 1 b. 2…. ) § Larger than 2 GB single dump file on V 9. 1 B and later § Possibility exists of “carrying” Database corruption to the new Database 16
Binary Dump Tips 17 q Use the Read Only (-RO) q Benchmark q Don’t specify a large -B with -RO q Use bin. p (or bin-nt. p) on the Brave. Point web site to generate D&L scripts (UNSUPPORTED) or dumpload. p in DBA Resource Kit q OE v 10: proutil dbname -C dump table -index 0 q Thread by storage area.
What Dump to Use q Do Parallel Dumps if… § You want to go home earlier § You have multiple CPUs § You have multiple (non-DB) disks with free space 18 q 1 -4 dump threads per CPU is usually safe q Once a table is finished dumping, you can start loading into a database on the other system or set of disks (I use the TEST DB)
What to Consider q q 19 Forman stats “If a table is large (10+ million records), the Binary dump might not be the fastest method because it is single threaded” Consider multiple, parallel, 4 GL dumps Bulk Load the Data (proutil bulkload) Alternative in V 9. 1 D+; you can use proutil dumpspecified instead of dump but major limitations – more in a later slide
Deep Thoughts q Sometimes using a non-primary index is faster particularly if the secondary index is ‘smaller’ q Don’t forget to dump Sequences and the User table (can’t Binary dump) q Don’t forget SQL 92 Privileges and Database _user table. 20
proutil dumpspecified 21 q Can’t have multiple streams dumping a table to the directory at the same time (table. bd) q -index does not work q Primary Index needs to be a single Field q The Index must have the same name as the Field
Load - Dictionary q Dictionary Load/Custom 4 GL Code § Same Advantages & Disadvantages as the Dictionary/custom Dump § Slowest Option (except in Parallel maybe. . . ) 22
Load - Bulk. Load q Bulk Load § Option on proutil § Can load Dictionary or 4 GL dump (. d) files § Very Fast but not quite as fast as Binary § Single Threaded only § Index Rebuild is required 23
LOAD - Binary q Single or Parallel loads are possible q Start the Database Broker to: § Observe the speed (Records Created) § Avoid BI Recovery for every LOAD 24 q Performance use the No Integrity (-i) Option q A number of problems(bugs) in the older versions (below 8. 3 C)
Load - Parallel q V 8 - NO, Increases the Scatter Factor; slow q V 9/OE 10 - One load thread per Area q Don’t Forget § § 25 No Integrity (-i) APWs, BIW -bibufs -directio (maybe) Big Cluster Size (16 mb+) -spin No AI
Index Rebuild q q q 26 Backup the Database BEFORE you start the Index Rebuild If the Index Rebuild fails you might not be able to restart it Disk Sort Method § Fastest (but single threaded) § Builds a more compact index § A Sort file is created on disk § Sort File on Non-DB Disks = 20% Faster
Disk Sort q Disk Sort Method § Sort File Size Estimate: 1 X-2 X Data Size to be absolutely safe § The Sort file is subject to the 2 GB limit until V 9. 1 D SP 07 or 08 § V 8. * allows Multi-Volume Sort files § V 8: Don't put TABs in the. srt file 27
Memory Sort q q q 28 Much Slower than Disk Sort Less Compact Index Use V 9+ idxcompact to compact the Index § A compaction percentage can be specified § Can be run online or offline § Only in V 9. 1 E and above No Disk Space Required Use a larger -B (but not too large)
Index Rebuild Options 29 q -TB 31 Disk Sort q -TM 32 Disk Sort q db. srt Multi-volume Sort file q -t Disk Sort; Unix q -B Useful for Memory Sort only q -SS V 9. 1 B ‘build indexes’ option (shortly) q -SG Sort Group; SP 07; default 48; use 64
Belt & Suspenders § Compare Record Counts § Check Logs for Known & Unknown Errors • grep for fail, error, (1124) § Check for Success Binary • grep “ 0 errors” /tmp/dl/*. log | wc –l • grep “ dumped” /tmp/dl/*. log • grep “ loaded” /tmp/dl/*. log • ls –l /tmp/dl/*. d 30
Buffer-Copy & Raw-Transfer q q q Very Fast Eliminates The Middle Man (temp file IO operations) Provides fine-grained, ABL level of control § Allows on the fly data manipulation § Useful when merging databases q q q Can use with remote connections to bridge version numbers. In OE 10. 1 performance is essentially equal Cannot use RAW-TRANSFER with -RO
Benchmarks (by others) 32 q Dict Load/Idx Inactive/idxbuild: 27: 07 q Dict Load/Idx Active (3 threads): 23: 57 q Bulk Load/idxbuild: 16: 10 q Serial Binary/idxbuild: 15: 15 q Parallel Binary/idxbuild: 15: 29 q Serial Binary with -SS: 19: 56 q Parallel Binary with -SS: 38: 04
Eaton Benchmark 33 q 80 GB MFG/PRO Database q Progress 8. 3 E q Single thread Binary dump time: 6 hrs q Multi-thread Binary dump time: 4. 5 hrs q Single thread Binary LOAD: 10 hrs q Multi-thread Binary Load: 4. 5 hrs q Index Rebuild: 12 hrs
Eaton Benchmark q Progress 10. 1 C DB: 2. 3 GB Tab: 3 mins q Data Dict Dump: 1: 15 hrs § Longet table: 47 min = multi-thread 47 mins q Single thread Binary dump time: 00: 17 hrs q Multi-thread Binary dump time: 00: 09 hrs q ON NAS § § § q All Var Ext Single thread Binary LOAD: 9 min idx 12 min All Var Ext Multi-thread Binary Load: 10 min idx 12 min Fixed Ext Multi-thread Binary Load: 13 mins indx 10 mins On EMC disk § Multi-thread load 4 mins idx 5 mins 34
Dictionary Dump Compare dbname=phroaux tfile=/cit/log/qad/$dbname. tabanalys. 130916 lst=`ls -1 /prog_bkup_new/phr/ascii/*. d` for i in $lst do xdump=0 xtable=`grep filename= $i|cut -f 2 -d"=" ` xdump=`grep records= $i|cut -f 2 -d"="` xtabdump=$(grep -i "PUB. $xtable " $tfile|tr -s " " "" |cut -f 2 -d" ") xdump=$(expr $xdump * 1) xtabdump=$(expr $xtabdump * 1) [[ $xdump -eq $xtabdump ]] && echo "Matched " $xtable : $xtabdump: $xdump: [[ ! $xdump -eq $xtabdump ]] && echo $i $xtable NO Match : ${xtabdump}"-"${xdump}: [[ ! $xdump -eq $xtabdump ]] && read a done 35
Binary Dump Multi-Thread DEF VAR X AS INT. DEF VAR xx AS INT. DEF VAR xcnt as INT. def var w-str as char. def var w-dbname as char format "x(50)". def var w-dumpdir as char format "x(78)". def var b as int. clear all. update w-dbname label "DB Name" help "Enter Full path and DB name ie: /mfgpro/phr/db 1/phrpmfg " skip w-dumpdir label "Dump dir" def var bb as int. help "Enter Full path of the Dump directory" DEF STREAM out 1. with frame a side-labels scrollable. DEF STREAM out 2. DEF STREAM out 3. Assign X = 1 xx = 0. DEF STREAM out 4. DEF STREAM out 5. OUTPUT STREAM out 1 TO "bdump 1. sh". OUTPUT STREAM out 2 TO "bdump 2. sh". OUTPUT STREAM out 3 TO "bdump 3. sh". OUTPUT STREAM out 4 TO "bdump 4. sh". OUTPUT STREAM out 5 TO "bdump 5. sh". /* find the total number of tables first */ for each _file no-lock where ( _file-number > 0 and _file-number < 30000): xcnt = xcnt + 1. end. 36
Binary 2 /* create dump scripts */ b = 2. IF X = 2 THEN PUT STREAM out 2 UNFORMATTED "$DLC/bin/proutil " + w-dbname + " -C dump " + _file-name + " " + w-dumpdir + w-str skip. FOR EACH _file NO-LOCK WHERE (_file-number > 0 and _file-number < 30000) break by _file-name: xx = xx + 1. w-str = " > " + _file-name + ". log ". if xx mod b = 0 then assign b = b + 2 w-str = w-str + " &". bb = xx mod b. IF xx MOD (xcnt / 5) = 0 THEN X = X + 1. /* display xx xcnt x b bb. pause. */ IF X = 1 THEN PUT STREAM out 1 UNFORMATTED "$DLC/bin/proutil " + w-dbname + " -C dump “ + _file-name + " " + w-dumpdir + w-str skip. IF X = 3 THEN PUT STREAM out 3 UNFORMATTED "$DLC/bin/proutil " + w-dbname + " -C dump " + _file-name + " " + w-dumpdir + w-str skip. IF X = 4 THEN PUT STREAM out 4 UNFORMATTED "$DLC/bin/proutil " + w-dbname + " -C dump " + _file-name + " " + w-dumpdir + w-str skip. IF X = 5 THEN PUT STREAM out 5 UNFORMATTED "$DLC/bin/proutil " + w-dbname + " -C dump " + _file-name + " " + w-dumpdir + w-str skip. END. OUTPUT STREAM out 1 CLOSE. OUTPUT STREAM out 2 CLOSE. OUTPUT STREAM out 3 CLOSE. OUTPUT STREAM out 4 CLOSE. OUTPUT STREAM out 5 CLOSE. 37
Additional Resources n http: //www. greenfieldtech. com/download s. shtml n q Buffer-Copy Pro D&L : Brave. Point § minimize D&L down time 38
Questions? 39
Thank you for your time 40
- Slides: 40