Short Queries and Indexes Local Adaptation from Henrietta

  • Slides: 25
Download presentation
Short Queries and Indexes Local Adaptation from: Henrietta Dombrovskaya, Boris Novikov “System Tuning” Saint

Short Queries and Indexes Local Adaptation from: Henrietta Dombrovskaya, Boris Novikov “System Tuning” Saint Petersburg, Russia 2006

What will be covered: • • • 2012 Which queries are considered short. Short

What will be covered: • • • 2012 Which queries are considered short. Short queries and indexes Indexes and mass data update Choosing selection criteria Excessive selection conditions How to avoid using indexes Joins order Impact of indexes on nested loops Other index types Henrietta Dombrovskaya – Enova Financial

Which Queries are Considered Short? • The query is considered short, when result can

Which Queries are Considered Short? • The query is considered short, when result can be obtained processing the small number of records, even if the tables are large. • For short queries sorting, grouping, and even joins are not time consuming. • Optimization goal for short queries: to avoid full scan of large tables (for small tables full scan may be still OK) • Typically for short queries tuning is necessary to improve throughput, not response time – the user does not care whether it takes 50 ms or 150 ms. 2012 Henrietta Dombrovskaya – Enova Financial

Short Queries and Indexes • If we want to avoid full scan, some sort

Short Queries and Indexes • If we want to avoid full scan, some sort of index for the table should exist • Any index is a part of the database schema; we can create new indexes only if schema changes are allowed. Sometimes they can’t be implemented right away, and sometimes delayed index creation may be almost impossible • Be aware of potential implication of index creation on other queries and data modifications 2012 Henrietta Dombrovskaya – Enova Financial

Indexes and Bulk Data Update drop index_1; drop index_2; … Bulk INSERT … create

Indexes and Bulk Data Update drop index_1; drop index_2; … Bulk INSERT … create index_1(…. ); create index_2(…. ); …. 2012 Henrietta Dombrovskaya – Enova Financial

Choosing Selection Criteria • • • 2012 Index selectivity Unique indexes Selection criteria and

Choosing Selection Criteria • • • 2012 Index selectivity Unique indexes Selection criteria and indexes Compound indexes Using index for data retrieval Henrietta Dombrovskaya – Enova Financial

Index Selectivity • Do not use an index with small number of distinct values

Index Selectivity • Do not use an index with small number of distinct values (exception - bitmap indexes, where applicable) • Index usage order: make sure that the index with the highest selectivity level will be used first 2012 Henrietta Dombrovskaya – Enova Financial

Index Selectivity - Example select customer_id from customer_sources cs where incoming_brand_id =11 AND type_cd

Index Selectivity - Example select customer_id from customer_sources cs where incoming_brand_id =11 AND type_cd IN ('lead_reject_import', 'pass_active_customer') AND cs. received_time > current_date - interval '24 hours' AND cs. received_time < current_timestamp interval ' 15 minutes' AND cs. source_type_cd not in ('yesloans 1 stgbi', 'yesloansgbi', 'noworries 1 stgbi', 'noworriesgbi ', 'aspiregbi', 'aspire-cpfgbi') We have indexes for type_cd, source_type_cd and received_time, the optimizer may choose condition with =, while in this case received_time has higher selectivity level. 2012 Henrietta Dombrovskaya – Enova Financial

Unique Indexes • If a column is described as a primary key, the unique

Unique Indexes • If a column is described as a primary key, the unique index will be created automatically • You may need extra unique constraints for data integrity purposes (each UNIQUE constraint will generate unique index, too) • Unique indexes make nested loops efficient. 2012 Henrietta Dombrovskaya – Enova Financial

Nested Loops FOR row 1 in table 1 loop For row 2 in table

Nested Loops FOR row 1 in table 1 loop For row 2 in table 2 loop If match(row 1, row 2) insert output row end loop • When to use: joins and products • Cost: proportional (Size_of_T 1)X(Size_of_T 2) Table 1 Table 2 5 3 8 1 6 2 3 4 1 3 1

Selection Criteria and Indexes – Columns Transformation • Any column transformations will prevent from

Selection Criteria and Indexes – Columns Transformation • Any column transformations will prevent from using an index: where lower(last_name)=‘smith’ • If we need to search by transformed value, we need to create additional index: CREATE INDEX people_m 13 ON private. people USING btree (lower(last_name: : text)); 2012 Henrietta Dombrovskaya – Enova Financial

Selection Criteria and Indexes- Using like Operator • Using like operator: – WHERE (lower(people.

Selection Criteria and Indexes- Using like Operator • Using like operator: – WHERE (lower(people. first_name) like E'chaman%‘ will not use the index Possible rewriting, if for some reason we can’t create an index: – WHERE (lower(people. first_name) >=E'chaman' and lower(people. first_name) <E'chamam') will use the index 2012 Henrietta Dombrovskaya – Enova Financial

Pattern Indexes CREATE INDEX people__last_name_pattern ON private. people (lower(last_name: : text) text_pattern_ops); 2012 Henrietta

Pattern Indexes CREATE INDEX people__last_name_pattern ON private. people (lower(last_name: : text) text_pattern_ops); 2012 Henrietta Dombrovskaya – Enova Financial

Compound Indexes • Compound index is build for several columns of one table. Note:

Compound Indexes • Compound index is build for several columns of one table. Note: if an index was built for columns (X, Y, Z), you can use it to search X, XY and XYZ, not Y and not YZ. Postgres : equality on leading columns, inequality on other columns, which will still be scanned, but may save additional trip to the table. • Why to create compound index? – Additional selectivity – Additional data storage • Index-organized tables – not currently available in Postgres 2012 Henrietta Dombrovskaya – Enova Financial

Using Indexes for Data Retrieval When all the columns from select statement are included

Using Indexes for Data Retrieval When all the columns from select statement are included into compound index, they may be retrieved without accessing the table. Example: CREATE INDEX loans_m 2 ON cnu. loans (customer_id, funding_date, status_cd); SELECT funding_date, status_cd FROM loans WHERE customer_id =1111111 2012 Henrietta Dombrovskaya – Enova Financial

Using Multiple Indexes in Postgres can use the search results from multiple indexes by

Using Multiple Indexes in Postgres can use the search results from multiple indexes by creating a bitmap of matching rows in main memory and then OR-ing or AND-ing them In this case the records will be scanned in the physical order, so the index-based ordering will be lost. Usage of compound indexes vs. sets of singlecolumn indexes should be justified on case-by-case basis. 2012 Henrietta Dombrovskaya – Enova Financial

Excessive Selection Criteria We can add redundant selection criteria to preselect small records subset

Excessive Selection Criteria We can add redundant selection criteria to preselect small records subset from the big table: – prompt to use specific indexes – reduce the sizes of join arguments. 2012 Henrietta Dombrovskaya – Enova Financial

Excessive Selection Criteria – Example SELECT <…> FROM reports r, expense_items i WHERE r.

Excessive Selection Criteria – Example SELECT <…> FROM reports r, expense_items i WHERE r. rep_id=i. rep_id AND (r. proc_date=’ 1 -jul-2003’ AND i. charge<0 OR r. proc_date=’ 15 -jun-2003 AND i. charge=0 ); In this case the complex selection criteria can’t be applied before the tables are joined. (expense_items may contain over several million records) 2012 Henrietta Dombrovskaya – Enova Financial

Excessive Selection Criteria – Example (2) We will modify this select statement the following

Excessive Selection Criteria – Example (2) We will modify this select statement the following way: SELECT <…> FROM reports r, expense_items I WHERE r. rep_id=i. rep_id AND (r. proc_date IN (‘ 1 -jul-2003’, ’ 15 -jun-2003’) ) AND (r. proc_date=’ 1 -jul-2003’ AND i. charge<0 OR r. proc_date=’ 15 -jun-2003’ AND i. charge=0 ); This will allow to restrict reports to the ones processed on Jul 1 and Jul 15, and then join only those reports with expense_items. 2012 Henrietta Dombrovskaya – Enova Financial

How to Avoid Using Indexes • Use system-specific mechanisms, like optimizer hints • When

How to Avoid Using Indexes • Use system-specific mechanisms, like optimizer hints • When optimizer hints can’t be used - modify selection criteria. Examples: – attr 1+0=p_value – COALESCE (t 1. attr 2, 0)=t 2. attr 2 2012 Henrietta Dombrovskaya – Enova Financial

Order of Joins In short queries the size of join result may be small

Order of Joins In short queries the size of join result may be small because of: – Restrictions on the joined tables (reduce the number of records in join arguments) – Semi-join (one argument significantly restricts the result size) If the optimizer generates an execution plan with large intermediate results size, the order of joins should be changed 2012 Henrietta Dombrovskaya – Enova Financial

Order of Joins - Example SELECT DISTINCT c. * FROM customers c LEFT OUTER

Order of Joins - Example SELECT DISTINCT c. * FROM customers c LEFT OUTER JOIN loans l on l. customer_id = c. id INNER JOIN customer_sources cs ON cs. id = (SELECT MAX(id) FROM customer_sources cs 1 WHERE cs 1. customer_id = c. id AND incoming_brand_id = c. brand_id AND type_cd IN ('lead_reject_import', 'pass_active_customer')) LEFT OUTER JOIN work_items w ON w. customer_id = c. id and w. created_by in ('generate_ast_workitems', 'generate_ast_workitems_instl') and w. created_on > current_date - interval '24 hours‘ WHERE c. brand_id = 11 AND c. status_cd = 'active' AND c. fraud_flg = FALSE AND cs. received_time > current_date - interval '24 AND cs. received_time < current_timestamp - interval ' 15 minutes' AND cs. source_type_cd not in ('yesloans 1 stgbi', 'yesloansgbi', 'noworries 1 stgbi', 'noworriesgbi', 'aspirecpfgbi') 2012 Henrietta Dombrovskaya – Enova Financial

Impact of Indexes on Nested Loops • For small queries nested loops will be

Impact of Indexes on Nested Loops • For small queries nested loops will be efficient, when the inner table is indexed by join attribute • No transformation should be applied to the indexed attribute. • Note: this may not work for long queries 2012 Henrietta Dombrovskaya – Enova Financial

Other Index Types • Function-based indexes - use when the queries WHERE clause often

Other Index Types • Function-based indexes - use when the queries WHERE clause often contains the same attribute(s) transformation(like end_date start_date) • Bitmap indexes - use for combination on several low-cardinality attributes, mostly for Data Warehouse type applications. • Partial indexes: CREATE INDEX payment_transactions_m 16 c ON cnu. payment_transactions_committed (credit_account_cd, status_cd, eff_date, payment_method_cd) WHERE status_cd: : text = 'created': : text AND payment_method_cd: : text = 'bank_account_ach': : text AND (credit_account_cd: : text = 'disbursement_account': : text OR credit_account_cd: : text = 'cso_disbursement_account': : text); 2012 Henrietta Dombrovskaya – Enova Financial

More Complex Index Example CREATE INDEX customers__email_alt_pattern_idx ON cnu. customers (lower(email_alt: : text) text_pattern_ops)

More Complex Index Example CREATE INDEX customers__email_alt_pattern_idx ON cnu. customers (lower(email_alt: : text) text_pattern_ops) WHERE email_alt IS NOT NULL; Selection criteria example: lower(email) like ‘johns%' OR (email_alt IS NOT NULL AND lower(email_alt) like ‘johns%') This condition will use bitmap OR’ing. 2012 Henrietta Dombrovskaya – Enova Financial