Data Warehousing Seminar Chapter 13 Indexing the Warehouse

  • Slides: 36
Download presentation
Data Warehousing Seminar Chapter 13 Indexing the Warehouse M. S. 2 Hyeyoung Cho Data

Data Warehousing Seminar Chapter 13 Indexing the Warehouse M. S. 2 Hyeyoung Cho Data Warehousing Lab.

Oracle Storage Database Tablespace Logical Data file Physical Segment Extent Oracle block OS block

Oracle Storage Database Tablespace Logical Data file Physical Segment Extent Oracle block OS block Data Warehousing Lab.

Data block and Row Data block Row header Free space Data Row header Column

Data block and Row Data block Row header Free space Data Row header Column length Column value Data Warehousing Lab.

What Is an Index? n A structure separate from the table n Stores the

What Is an Index? n A structure separate from the table n Stores the location of rows based on the specified column values n Speed up the retrieval of rows by using a pointer n Reduce disk I/O by using rapid path access method to locate the data quickly n Used and maintained automatically by the Oracle Server Data Warehousing Lab.

When to create an Index? n The table is large n The columns are

When to create an Index? n The table is large n The columns are often used as a condition in the query n The column contains a wide range of value or a large number of null value n Most queries expected to retrieve less than 5% of the rows n Automatically created when define PRIMARY KEY or UNIQUE constraint Data Warehousing Lab.

Classification of Indexes n Logical(application perspective) Single column or concatenated © Unique or nonunique

Classification of Indexes n Logical(application perspective) Single column or concatenated © Unique or nonunique © Function-based © n Physical(storing perspective) B-tree © Normal or reverse key © Bitmap © Partitioned or nonpartitioned © Data Warehousing Lab.

Single-column and Composite index n Single-column indexes © one column in the index key

Single-column and Composite index n Single-column indexes © one column in the index key create index purchase 1 on purchase (purchase_id) storage (initial 2 m next 2 m pctincrease 0) tablespace purch_ind 1; n Composite index © multiple columns in the index key © Max : 32 , 1/3 of the data block size create index purchase 1 on purchase (purchase_id, purchase_date, total_amt) storage (initial 2 m next 2 m pctincrease 0) tablespace purch_ind 1; Data Warehousing Lab.

Unique or nonunique n Unique © n A single key point to only one

Unique or nonunique n Unique © n A single key point to only one row Nonunique © A single key associated with multiple rows Data Warehousing Lab.

Function-based indexes(1/3) n Oracle 8 i new feature n Query rewrite privilege required n

Function-based indexes(1/3) n Oracle 8 i new feature n Query rewrite privilege required n Using functions or expressions involve one or more columns in the table n Precomputes the value of the function or expression store it in the index n Created as either a B-tree or a Bitmap index Data Warehousing Lab.

Function-based indexes(2/3) n Example 1. Client names in a mixed case © Index creation

Function-based indexes(2/3) n Example 1. Client names in a mixed case © Index creation create index billing_upcl on billing (upper(client)) storage(initial 20 m next 80 m maxextents unlimited pctincrease 0) tablespace my_indexes; © Statement predicate select bill_id, client, state_nm from billing where upper(client) = ‘MONSANTO’ ; Data Warehousing Lab.

Function-based indexes(3/3) n Example 2. Commission exceeded 25% of salary for the certain time

Function-based indexes(3/3) n Example 2. Commission exceeded 25% of salary for the certain time period © Index creation create index sale_bc_amt on sale (comm/(base+comm)*100) … storage(initial 20 m next 80 m maxextents unlimited pctincrease 0) tablespace my_indexes; © Statement predicate select sum(comm) from sale where comm/(base+comm) * 100 > 25 and tr_date between to_date(’ 01 -MAY-2002’, ‘DD-MON-YYYY’) and to_date (’ 30 -JUN-2002’, ‘DD-MON-YYYY’); Data Warehousing Lab.

B-tree indexes(1/3) n n Traditional indexing technique! Stores a list of ROWID for each

B-tree indexes(1/3) n n Traditional indexing technique! Stores a list of ROWID for each key A hierarchy of highest-level and lower level index blocks(root> branch> leaf) Leaf Entry Format © Header : chaining info, row lock status, number of columns © Key column length and value pairs ROWID : the key values (block num. row num. file num) © n n Simplicity, Easy maintenance, High cardinality columns Suitable for exact match query and range query Data Warehousing Lab.

B-tree indexes(2/3) n Structure Root block Branch block Leaf blocks Index entry header Key

B-tree indexes(2/3) n Structure Root block Branch block Leaf blocks Index entry header Key column length Index entry Key column value ROWID Data Warehousing Lab.

B-tree indexes(3/3) n Creating Normal B-Tree Indexes Create index employee_last_name_idx on employee(last_name) pctfree 30

B-tree indexes(3/3) n Creating Normal B-Tree Indexes Create index employee_last_name_idx on employee(last_name) pctfree 30 storage (initial 200 k next 200 k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] index [schema. ] index on [schema. ] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT] Data Warehousing Lab.

Reverse Key indexes(1/3) n Reverse the bytes of each column indexed (except the ROWID)

Reverse Key indexes(1/3) n Reverse the bytes of each column indexed (except the ROWID) n Spreading the work load across multiple blocks n Unsuitable for range queries n Use the keyword reverse Data Warehousing Lab.

Reverse Key indexes(2/3) EMPLOYEE table Index on EMPLOYEE(ID) KEY ROWID ID (BLOCK# ID FIRST_NAME

Reverse Key indexes(2/3) EMPLOYEE table Index on EMPLOYEE(ID) KEY ROWID ID (BLOCK# ID FIRST_NAME JOB ----------------------- --------- 1257 00000 F. 0002. 0001 7499 ALLEN SALESMAN 2877 00000 F. 0006. 0001 7369 SMITH CLERK 4567 00000 F. 0004. 0001 7521 WARD SALESMAN 6657 00000 F. 0003. 0001 7566 JONES MANAGER 8967 00000 F. 0005. 0001 7654 MARTIN SALESMAN ROW# FILE#) … … … … … Data Warehousing Lab.

Reverse Key indexes(3/3) n Creating Reverse key index Create unique index orders_id_idx on orders(id)

Reverse Key indexes(3/3) n Creating Reverse key index Create unique index orders_id_idx on orders(id) reverse pctfree 30 storage (initial 200 k next 200 k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] index [schema. ] index on [schema. ] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT] REVERSE Data Warehousing Lab.

Bitmap indexes(1/4) n Stores a bitmap for each key value For Low cardinality columns

Bitmap indexes(1/4) n Stores a bitmap for each key value For Low cardinality columns n Leaf Entry Format n Header : chaining information, row lock status, number of columns © Key column length and value pairs © Start ROWID , End ROWID : the first row and the last row pointed by the bitmap (block num. row num. file num) © Bitmap : a string of bits depending on key value © Data Warehousing Lab.

Bitmap indexes(2/4) File 3 Table n Structure Block 10 Index Block 11 Block 12

Bitmap indexes(2/4) File 3 Table n Structure Block 10 Index Block 11 Block 12 Key <Blue <Green <Red <Yellow start. ROWID 10. 0. 3, end. ROWID Bitmap 12. 8. 3, 100010010> 12. 8. 3, 0001010000100> 12. 8. 3, 0100000011000> 12. 8. 3, 001000001> Data Warehousing Lab.

Bitmap indexes(3/4) n Creating Bitmap index Create bitmap index person_region on person(region) tablespace indexes_prd

Bitmap indexes(3/4) n Creating Bitmap index Create bitmap index person_region on person(region) tablespace indexes_prd pctfree 30 storage (initial 200 k next 200 k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] BITMAP index [schema. ] index on [schema. ] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT] Data Warehousing Lab.

Bitmap indexes(4/4) n Example : a bitmap index on the PERSON table RESION Bitmap

Bitmap indexes(4/4) n Example : a bitmap index on the PERSON table RESION Bitmap Index Row Region North. Bitmap East. Bitmap West. Bitmap South. Bitmap 1 North 1 0 0 0 2 East 0 1 0 0 3 West 0 0 1 0 4 West 0 0 1 0 5 East 0 1 0 0 6 West 0 0 1 0 7 South 0 0 0 1 8 North 1 0 0 0 Data Warehousing Lab.

B-Tree index VS Bitmap index B-tree Suitable for highcardinality columns Row-level locking Bitmap Suitable

B-Tree index VS Bitmap index B-tree Suitable for highcardinality columns Row-level locking Bitmap Suitable for high-cardinality columns Bitmap-segment-level locking Update on keys relatively inexpensive More storage Update on keys very expensive Less storage Inefficient for queries using OR predicates Useful for OLPT Efficient for queries using OR predicates Useful for data arehousing Data Warehousing Lab.

B-Tree space VS Bitmap space n Bitmap index use 1/100 of the space of

B-Tree space VS Bitmap space n Bitmap index use 1/100 of the space of the Btree index! Unique Column Cardinality(%) Values 500, 000 10, 000 100 5 B-Tree Space 50. 00 15. 29 10. 00 15. 21 1. 00 14. 34 0. 01 13. 40 < 0. 01 13. 40 Bitmap Space 12. 35 5. 25 2. 99 1. 38 0. 78 • Table with 1, 000 rows Data Warehousing Lab.

Index-organized tables(IOT)(1/3) n n n Merge the data and index pieces into the same

Index-organized tables(IOT)(1/3) n n n Merge the data and index pieces into the same segment No duplication of the values for the Key column Faster key-based access for queries involving exact match and range searches Must have a primary key Specify an overflow tablespace name and percentage Secondary indexes(Oracle 8 i new feature) Data Warehousing Lab.

Index-organized tables(IOT)(2/3) IOT access Regular table access Only One scan! Index ROWID Non-key columns

Index-organized tables(IOT)(2/3) IOT access Regular table access Only One scan! Index ROWID Non-key columns Table Key columns Row header Data Warehousing Lab.

Index-organized tables(IOT)(3/3) n Creating Index-organized table create table sales ( office_cd number(3), qtr_end date,

Index-organized tables(IOT)(3/3) n Creating Index-organized table create table sales ( office_cd number(3), qtr_end date, revenue number(10, 2), review varchar 2(1000) constraint sales_pk PRIMARY KEY (office_cd, qtr_end)) ORGANIZATION INDEX tablespace indx PCTTHRESHOLD 20 INCLUDING revenue OVERFLOW tablespace user_data; Data Warehousing Lab.

Indexes on Partitioned Tables(1/4) n An index in several segments n Spread across many

Indexes on Partitioned Tables(1/4) n An index in several segments n Spread across many tablespaces n © Decreasing contention for index lookup © Increasing manageability and scalability Used with partitioned tables © Creating Index partition for each table partition Data Warehousing Lab.

Indexes on Partitioned Tables(2/4) n Local index © n Global index © n Partition

Indexes on Partitioned Tables(2/4) n Local index © n Global index © n Partition keys of the index differ from its underlying table Prefixed index © n Partition keys of the index match its underlying table Left-most column in a partitioned index matches the leftmost column in that index’s partition key. Nonprefixed index © Left-most column in a partitioned index differ from the left -most column in that index’s partition key. Data Warehousing Lab.

Indexes on Partitioned Tables(3/4) n Creating a local partitioned index create table rumors( thorn_id

Indexes on Partitioned Tables(3/4) n Creating a local partitioned index create table rumors( thorn_id number(10), rumor_id number(4), …) partition by range (rumor_id) (partition rumors_p 001 values less than(41), partition rumors_p 002 values less than(50), … partition rumors_pmax values less than(maxvalue)); create unique index rumors_u 1 on rumors(thorn_id, rumor_id) local (partition rumors_u 1_p 001, partition rumors_u 1_p 002, … partition rumors_u 1_pmax); Data Warehousing Lab.

Indexes on Partitioned Tables(4/4) n Creating a global partitioned index create table billing(bill_id number(10),

Indexes on Partitioned Tables(4/4) n Creating a global partitioned index create table billing(bill_id number(10), region_id varchar 2(3), …) partition by range (bill_id) (partition bill_p 001 values less than(90000), partition bill_p 002 values less than(130000), … partition bill_pmax values less than(maxvalue)); create unique index billing_u 1 on billing(bill_id) global partition by range (bill_id) (partition bill_u 1_p 001 values less than(100000), partition bill_u 1_p 001 values less than(200000), … partition bill_u 1_pmax values less than(maxvalue) ); Data Warehousing Lab.

Optimizer Histograms(1/2) n Optimizer © Rule-based optimizer =Uses an set of rules for ranking

Optimizer Histograms(1/2) n Optimizer © Rule-based optimizer =Uses an set of rules for ranking access path =Syntax- and data dictionary-driven © Cost-based optimizer =Chooses least-cost(resource, time) path =Statistics-driven analyze table_name compute statistics; Data Warehousing Lab.

Optimizer Histograms(2/2) n Histogram © Describe the data distribution of a particular column in

Optimizer Histograms(2/2) n Histogram © Describe the data distribution of a particular column in more detail © Better predicate selectivity estimate for unevenly distributed data © Bucket : the number of distinctive column value analyze table_name compute statistics for table for all Indexed columns size 6; Data Warehousing Lab.

Optimizer Histograms(2/2) Data Warehousing Lab.

Optimizer Histograms(2/2) Data Warehousing Lab.

Guidelines(1) n Use NOLOGGING for large indexes creation Index build times with and without

Guidelines(1) n Use NOLOGGING for large indexes creation Index build times with and without NOLOGGING n Rows in Table Indexes With NOLOGGING Without NOLOGGING 46, 34013 13 57 s 3 m 57 s 1, 094, 8146 6 10 m 5 s 24 m 36 s 4, 4013, 309 4 27 m 54 s 60 m 48 s Rebuilding index © Use Different tablespace © Converting a index into a reverse key index ALTER INDEX orders_region_id_idx REBUILD (REVERSE) Tablespace indx 02 NOLOGGING; Data Warehousing Lab.

Guidelines(2) n n Temporary workspace © Create during the life of a create index

Guidelines(2) n n Temporary workspace © Create during the life of a create index statement © Dropped after the activity completes Sort space parameter © n SORT_AREA_SIZE Shared pool parameter © shared_pool_size = 10000000 Data Warehousing Lab.

Oracle Instance INSTANCE Memory Shared pool Library Cache Data Dictionary cache SGA(System Global Area)

Oracle Instance INSTANCE Memory Shared pool Library Cache Data Dictionary cache SGA(System Global Area) Data buffer Redo log Cache Buffer structures Background SMON DBW 0 PMON CKPT LGWR PGA(Program Global Area) User process sql Server processes Databae sort area, cursor state, session info, stack space Data Warehousing Lab.