Copyright IBM Corporation 2012 Optim Performance Relational Database
© Copyright IBM Corporation 2012
Optim Performance Relational Database Table Processing © Copyright IBM Corporation 2012
Optim Performance (General) Optim server type matters. Windows servers perform faster than Unix variant machines. However, vertical scaling of Windows servers is very poor compared to Linux. © Copyright IBM Corporation 2012
Optim Performance (Archive/Extract) The number of parallel processes required is determined by the amount of data that must be moved and the client established SLA for moving that data. Simply put, “What is the frequency of Archive/Extract and how much time is available to do it? ” In the case of TDM this also includes frequency and volume of database refreshes. © Copyright IBM Corporation 2012
Optim Performance (Archive/Extract) © Copyright IBM Corporation 2012
Optim Performance Measurement (Archive/Extract) © Copyright IBM Corporation 2012
Optim Performance Measurement (Archive/Extract) (Ref Table) Table Name: CRWAUDRPTHST. PSPADM. PS_AFS_INS DBMS: DB 2 LUW Version: 09. 07. 0004 Columns: 92 Cycle: No Lobs: No Est. Rows: N/A Row Length: 794 DB Connections: 1 Select With UR: No Fetch Buffer Size per Connection: 512 K PK W/Index: N/A PK WO/Index: N/A FK WO/Index: N/A Parent Strategy: N/A Dependent Strategy: N/A DBMS Access: Access Type: Cursor Scan Keys Per Cursor: 0 Open Cursor: 1 Rows Fetched: 42849 Rows Written: 42849 Time in DBMS: 33% Time in DBMS Archive Actions: 0% Process Time: 0: 05 Rows Per Sec: 8569 794 * 42, 849 = 34, 022, 106 / 5 = 6, 804, 421. 2 * 1024 -2 = 6. 49 MB/sec © Copyright IBM Corporation 2012
Optim Performance Measurement (Archive/Extract) (Ref Table) Table Name: CRWAUDRPTHST. PSPADM. PS_COLLATRL_BL DBMS: DB 2 LUW Version: 09. 07. 0004 Columns: 20 Cycle: No Lobs: No Est. Rows: N/A Row Length: 250 DB Connections: 1 Select With UR: No Fetch Buffer Size per Connection: 512 K PK W/Index: N/A PK WO/Index: N/A FK WO/Index: N/A Parent Strategy: N/A Dependent Strategy: N/A DBMS Access: Access Type: Cursor Scan Keys Per Cursor: 0 Open Cursor: 1 Rows Fetched: 470194384 Rows Written: 470194384 Time in DBMS: 29% Time in DBMS Archive Actions: 0% Process Time: 4: 59: 19 Rows Per Sec: 26181 250 * 470, 194, 384 = 1. 17548596 E 11 / 17, 959 = 6, 545, 386. 5 * 1024 -2 = 6. 24 MB/sec © Copyright IBM Corporation 2012
Optim Performance Tuning (Archive/Extract) © Copyright IBM Corporation 2012
Optim Performance Tuning (Archive/Extract) © Copyright IBM Corporation 2012
Optim Performance Tuning (Settings) 10. 2. 13 Optim performance parameter settings Optim has numerous settings that have a direct impact on performance. The settings that are described in this section are located in separate areas within Optim and can vary between OS platforms. Several of the settings are Optim software configuration settings, several of the settings are part of Optim requests, and other settings belong to Optim objects that are used by the requests. You can see the locations of the settings in the IBM Optim Installation and Configuration Guide and the IBM Optim Common Elements Manual. © Copyright IBM Corporation 2012
Optim Performance (Specific Settings) The following list shows the Optim performance parameter settings: Archival of database objects: – By default, Optim archives database objects by capturing the Data Definition Language (DDL), or the actual object is placed in the archive files. These objects include primary keys, relationships, indexes, partition functions, procedures, and so on. – Several of these objects can affect performance significantly. We suggest that you do not select these objects when archiving data. If the objects are needed to create tables in the future, run a special archive that only archives the objects and not the data. Compression of archive files The archive files can be compressed or left uncompressed: – Compress the archive files to save disk space, which increases the run time. – Leave them uncompressed, which uses more disk space, but shortens run time. – Compress selected tables in an archive file, which shortens run time and saves disk space when compression is only applied to tables that are highly compressible. © Copyright IBM Corporation 2012
Optim Performance (Specific Settings) – Compress archive files after the archive process is complete, which shortens the archive run time, but uses more disk space temporarily and uses more resources overall. Maximum database connections: – This setting determines the number of parallel database connections. The number of connections that are actually used can differ from the setting. For more details, see the IBM Optim Common Elements Manual, which is delivered with the product. – Start with two database connections and test by increasing in multiples of two until performance improvements diminish. – This setting affects each parallel process that is actively running. – Monitor the source DBMS server for over-utilization. Fetch buffer size: – This setting determines the size of the buffer that is returned to Optim from a DBMS fetch. – The default of 512 K is usually the best setting. The initial setting is multiplied by the maximum database connections. If the setting is 512 K and four database connections are specified, the fetch buffer size is automatically set to 2048 K (512 K assigned to each connection) © Copyright IBM Corporation 2012
Optim Performance (Specific Settings) Delete buffer size: – This setting determines the delete buffer size and only applies to array deletes. The setting is also affected by commit frequency. For more details, see the IBM Optim Common Elements Manual, which is delivered with the product. – The default of 512 K is usually the best setting. The initial setting is multiplied by the maximum database connections. If the setting is 512 K and four database connects are specified, the fetch buffer size is automatically set to 2048 K (512 K assigned to each connection). Parent and child key lookup limits: – This value determines the number of keys that are searched for during a single DBMS request. The possible values are 1 to 100. – This setting requires experimentation in the actual environment. – Typically, if more keys are returned from a single request, performance improves. © Copyright IBM Corporation 2012
Optim Performance (Specific Settings) DBMS commit frequency: – This setting affects database deletes and restores. The possible values are 1 to 999, 999. – More frequent commits reduce the chance of failed transactions because of resource locking, but increase run time. – Less frequent commits increase the chance of failed transactions due to resource locking, but improve performance. – This setting must be evaluated on an application-by-application basis. Delete process lock tables option – Locking is requested at the table level. This option prevents all access to a table while that table is being processed. © Copyright IBM Corporation 2012
Optim Performance Troubleshooting Sometimes after analyzing and changing all of the appropriate settings, Optim processes still may not perform as expected or with less than acceptable performance • Perform the same function on the database server using native database commands • • • (close to the data source native command testing) then repeat from the Optim server Test the network’s sustained data rate while the “problem process” is running Test the disk subsystem sustained data rate (must be done on the local server) Make sure nonessential tracing and logging functions are turned off For inserts and loads are an excessive number of indices being updated? If a single table or group of tables are not performing satisfactorily, find out what is different about those specific tables: Are they on a WAN? On different storage that is slower? © Copyright IBM Corporation 2012
© Copyright IBM Corporation 2012
Optim Connect Query Performance We will cover 1. How does Optim Connect query performance compare to RDBMS query Performance? 2. Factors Affecting Optim Connect Performance • • • Disk CPU Archive Indexes Query Tuning Optim Connect Tuning Parameters Federation 3. How does Optim Connect query performance compare to our archive competitors (such as Informatica Rainstor)? 4. What is on the roadmap for Optim archive querying? © Copyright IBM Corporation 2012
Optim Connect Performance Resources • • Red Book SG 247936 Optim Connect Tuning Parameters v 1. 3. doc Optim Connect Query and Index Guidelines v 1. 5. doc ODM_Performance_Class_plus_Lab_2_v 3. ppt © Copyright IBM Corporation 2012
Optim Connect vs RDBMS Query Performance Is an Optim Connect query faster or slower than the same query on an RDBMS? • It depends on the supporting hardware for Optim Connect and the RDBMS servers • It depends on the type of query. Simple queries with highly selective filters can be faster on Optim Connect if an archive index exists. © Copyright IBM Corporation 2012
Factors Affecting Optim Connect Performance 1. Disk Performance (Temp and archive/index disk) 2. CPU Performance (Faster is better, having more CPU’s will not speed up an individual query) 3. Archive Indexes (Do they exist? Are they optimal? ) 4. Query Tuning (Are there any filter conditions? Can the query be rewritten? ) 5. Optim Connect Tuning Parameters (Avoid Hash Join and increase the memory parameters) 6. Federation (hs. ODBC, DG 4 ODBC, DB 2 Fed Server, Sql Server Link Server) © Copyright IBM Corporation 2012
Disk Performance – Temp Disk Optim Connect Temp disk is important (for joins and sorts). Write speed is important. © Copyright IBM Corporation 2012
Disk Performance – Archive and Index Disk • • Faster is better SAN vs NAS How fast does it need to be? How fast is your disk? The dd command can be used on Unix/Linux to determine your disk speed: dd of=/dev/null if=archive 1. AFX bs=4096 Rule of Thumb: 30 MB/sec or higher won’t adversely impact ODM © Copyright IBM Corporation 2012
CPU Performance • • Faster is better Optim Connect queries are always single threaded (only one CPU core can be used for a given query). Generally the order fastest to slowest is Windows/Linux Intel, …, IBM Power, Sun Sparc. To get CPU information on Linux: “cat /proc/cpuinfo” © Copyright IBM Corporation 2012
Archive Indexes • • Archive indexes are critical to performance Join columns and where clauses should be indexed PST_GDB can only use one index per table (at present) Generally avoid many multi column indexes but sometimes they are required. The size of the index file can become many times the size of the archive file if many multi column indexes are used! This is due to the fact that the index file cannot be compressed like the archive file. © Copyright IBM Corporation 2012
Archive Query Tuning (Red Book sg 247936 Section 10. 5) • The nav_util “explain” utility can be used to get an explain plan for query execution (for the purpose of query tuning). • Example usage (from the nav_util command line): • The key thing is to determine if an index is being used and if that index is optimal © Copyright IBM Corporation 2012
First Explain In the first execution none of the archive indexes are used. Even though an index exists on the Orders table for the field that participates in the join (o. cust_id), that index is not used because of the expression that is coded for o. freight_charges. © Copyright IBM Corporation 2012
Second Explain For the second execution, we make changes to improve the performance from the first run. It appears that the table scan resulting from the expression O. FREIGHT_CHARG ES > 10 is the culprit. Therefore, we add an index to the o. freight_charges field. We then rerun the explain command. © Copyright IBM Corporation 2012
Third Explain In the third, and final, execution, a single composite index is created using the o. cust_id and o. freight_charges fields. Using the composite index produced excellent results. The cost of execution is 67. 6% less than the first run and 65. 1% less than the second run. © Copyright IBM Corporation 2012
Query Tuning - Rewriting the query Typical SQL considerations apply: HAVING clause For Example: Write the query as SELECT subject, count(subject) FROM student_details WHERE subject = 'Science' GROUP BY subject; Instead of: SELECT subject, count(subject) FROM student_details GROUP BY subject HAVING subject= 'Science' ; © Copyright IBM Corporation 2012
Query Tuning - Rewriting the query (ctd. ) 7) Be careful while using conditions in WHERE clause. SELECT id, first_name, age FROM student_details WHERE class_year in (‘ 2009’, ‘ 2010’); Instead of: SELECT id, first_name, age FROM student_details WHERE class_year != ‘ 2011’ and class_year != ‘ 2012’; Write the query as SELECT id, first_name, age FROM student_details WHERE first_name LIKE 'Mar%'; Instead of: SELECT id, first_name, age FROM student_details WHERE SUBSTR(first_name, 1, 3) = 'Mar'; Write the query as SELECT id, name, salary FROM employee WHERE salary < 25000; Instead of: SELECT id, name, salary FROM employee WHERE salary + 10000 < 35000; © Copyright IBM Corporation 2012
Query Tuning - Final Some queries may never perform well in Optim Connect (if no filter condition exists and the table is large) - but there is hope. Example: SELECT tran_type, sum(amount) FROM sales_tran GROUP BY tran_type Creating a view and archiving the view will improve query performance in this case. This approach is only viable if the required query is known. © Copyright IBM Corporation 2012
Optim Connect Tuning Params – Avoid Scan Optimizer - Avoid scan: This setting can be checked to avoid table scans where possible. The optimizer may choose to do a table scan if it believes this more efficient than using an index. In certain circumstances tables scans are better than index scans (usually with very small tables and very large tables). Optim support does not recommend this option always be checked. In one test, checking this option caused a test query that returned in 120 secs to not return after 4 hours! © Copyright IBM Corporation 2012
Optim Connect Tuning Params - Disable Hash Join Optimizer - Disable Hash Join: If disk space is limited or disk performance is slow then disabling hash joins may improve performance. Optim Support generally recommends to check this option. In some cases if this option is not checked large hash join files can be produced which significantly increase query times. © Copyright IBM Corporation 2012
Optim Connect Tuning Params - Memory Parameters • More memory for Optim Connect is better, but be careful not to crash the machine. • On Unix/Linux Ulimits can be set to limit memory usage • One test with a setting of 100, 000 (for the four parameters above) with 2 concurrent queries resulted in a 2 GB memory usage on the test machine. © Copyright IBM Corporation 2012
Federation • Optim Connect can be federated through Oracle (hs. ODBC and DG 4 ODBC), DB 2 Fed Server, Sql Server Link Server. • This can work well (allows native queries for instance) but there are performance risks • If the federation server decides to pull all the data onto it’s server and do the joins there, then performance can be poor. Optim Connect Views can be created to force the processing back as onto the Optim Connect server and improve performance © Copyright IBM Corporation 2012
Optim Connect Performance vs Informatica/Rainstor • • Informatica have partnered with Rainstor a columnar database vendor that support SQL querying against archived data in their database repository. Archive times are much longer with Informatica/Rainstor because of the extra processing done at load time. • Rainstor’s Query times can be much lower than Optim Connect’s in some cases (especially for queries that have no filter criteria). • Rainstor’s columnar compression not as effective as Optim’s compression in many common types for structured data (Jim Lee). © Copyright IBM Corporation 2012
The Future of Optim Archive Querying • • • Big Data and the emerging Cloud technologies (such as Hadoop and HDFS), there is a shift toward computing platforms that parallelize data reads across many nodes. With Big Data the typical bottleneck is CPU for reading the large data volumes (this is exactly the issue with Optim Connect). Query-able archive (Q 3/Q 4 2012 is the goal – Dave Henderson). Possible integration of recent acquisition technology (Vivisimo – replacement for Optim Datafind) © Copyright IBM Corporation 2012
Questions © Copyright IBM Corporation 2012
- Slides: 39