Transbase Hypercube A leadingedge ROLAP Engine supporting multidimensional
Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software Gmb. H Thomas-Dehler-Str. 18 81737 München, Germany www. transaction. de
Motivation n Many applications have multidimensional data n Multidimensional indexes support retrieval of MD data n Application Field: Data Warehouses § § Hierarchically organized dimensions (e. g. , year – month – day) Large data volumes Relatively static Mainly retrieval query profile n MD indexes usually support numeric MD data n Encoding for hierarchical data necessary Multidimensional Hierarchical Clustering (MHC) Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -2 -
Theoretical comparison of range query performance compound primary B-Tree multiple multidimensional B-Trees, index bitmap indexes Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 ideal case Feb 2003 -3 -
UB-Tree: basic concepts n Combination of B+-Tree and Z-curve § Z-curve is used to map multidimensional points to one-dimensional values (Z-values) § Z-values are used as keys in B*-Tree § Z-curve preserves spatial-proximity symmetric clustering Index part 28 8 17 39 51 Data part Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -4 -
Visualized range-queries Freiberg Sachsen Leipzig Germany Dresden Burgh Bayern München Passau Feb 2003 BTW 2003 Mar 2003 Jun 2003 Feb 2003 -5 -
MHC: Non-clustered hierarchy Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -6 -
MHC: Clustered hierarchy Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -7 -
Basic technology of MHC n MHC: Multidimensional Hierarchical Clustering n MHC necessary because § Hierarchical organization of dimensions in warehouses § No intervals for hierarchical restrictions § Naive restrictions lead to many point queries instead of one interval on UB-Tree n Artificial encoding of hierarchies: § § Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -8 -
Implementation of MHC n Implementation into Transbase® DBMS kernel § § Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools n Transparency to users § Physical optimization § No extension of the DML Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 -9 -
Supported schemata n Support of star schema and snowflake schema n Star schemata § Conventional complete de-normaliation of the dimension tables § Foreign key relationships between fact table and dimension tables n Supported snowflake schemata § § Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical dependency) Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 10 -
Transbase® DDL extension n Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) ) Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 11 -
Transbase® DDL extension (cont. ) n Fact Table: CREATE TABLE fact ( dseg INTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprod INTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtime INTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10, 2) … SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) ) Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 12 -
DML n No change of DML statements (SELECT, INSERT, UPDATE, DELETE) n Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f. custkey = c. customer AND f. prodkey = p. item_key AND f. datekey = d. day AND c. country = 'GERMANY' AND c. department = 'SOUTH' AND p. category = 'TV' AND d. month = '10/2002' AND d. year = '2002' GROUP BY country, department, category, group, year, month Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 13 -
Conventional query processing n Standard method (non-clustering indexes): § § § Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 14 -
MHC query processing: Overview n Abstract execution plan: better understanding, implementation in operator trees n Three phases: § Interval generation (semi – join) § Fact table access § Grouping and residual join n Optimizing: hierarchical pre-grouping § Minimize residual join operations by grouping before joining Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 15 -
AEP - overview Main Execution Phase Order By Having Group Select Di. . . Residual Join Dk Fact Table Access Create Range . . . Fact Di Dj Interval Generation Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 16 -
Interval generation n Mapping of hierarchical restrictions into a number of intervals n Usage of special hierarchy indexes: § DXh Index: (ht, ht-1, . . . , h 1, cs) § Efficient interval computation n Optimization for feature restrictions: § Merging many small intervals to less large intervals § Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata) Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 17 -
Fact table access n Combination of intervals of all clustering dimensions forms multidimensional query boxes QBi n Fact table access with implicit tuple materialization n Sequential processing of query boxes n Fast retrieving of result tuples n Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 18 -
Standard AEP Order By Having Group Select Di Residual Join . . . Fact Table Access Dk Predicate Evaluation Fact Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 19 -
Optimization: Hierarchical pre-grouping n Basic concept § Hierarchy encoding stored in fact table (compound surrogates) § Groups of hierarchical GROUP BY attributes built from compound surrogates § Grouping not exact for non-prefix path grouping § Drastic reduction of fact table result tuples § Example (for hierarchy year – month – day): number of fact table result tuples: 100. 000 pre-grouping (on month): ca. 3. 000 (aggregated) tuples residual join with 3. 000 instead of 100. 000 tuples reduction by a factor of 30! § Possibly post-grouping necessary for too fine pre-grouping Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 20 -
Hierarchical pre-grouping (cont. ) Order By Having Dl 1 Residual Join . . . Dln Post-Group De 1. . . Residual Join Pre-Group Dei Fact Table Access Predicate Evaluation Fact Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 21 -
Performance comparison n Data: § Real world data warehouse of electronic retailer in Greece § 5 dimensions, 49 measures on fact table § 3 years of transactions, i. e. , 8, 5 million fact table tuples (2, 8 GB) n Environment § 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 n Queries § 22 query classes with 1. 320 real world user queries n Comparisons § MHC versus no multidimensional clustering § Conventional grouping versus hierarchical pre-grouping Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 22 -
Perf. comp: MHC – no clustering Time of fact tuple access in seconds FT Sel. % [0. 0 -0. 1] STAR [0. 1 -1. 0] AEP [1. 0 -5. 0] STAR AEP MIN 0 0 65 2 274 11 MAX 30 6 290 9 1219 47 MEDIAN 1 1 182 8 477 23 STD-DEV 5 1 76 3 346 14 Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 23 -
Perf. comp: no pre-grouping – pre-grouping Comparison of grouping Cardinality: No pre-grouping / Hier. pre-grouping FT Sel. % All MIN [0. 0 - 0. 25] [0. 25 - 1. 0] [1. 0 - 10. 0] 3, 6 21, 3 46, 0 245, 8 135, 1 911, 3 816, 2 MEDIAN 1. 139, 5 531, 6 2. 270, 4 5. 938, 9 3. Quartile 4. 708, 0 1. 905, 6 9. 747, 5 25. 409, 6 593. 280, 0 19. 340, 0 78. 384, 0 593. 280, 0 1. Quartile MAX Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 24 -
Perf. comp: no pre-grouping – pre-grouping Speedup of the time of hierarchical pre-grouping FT Sel. % ALL [0. 0 - 0. 25] [0. 25 - 1. 0] [1. 0 - 10. 0] MIN 0, 3 0, 8 0, 6 1. Quartile 3, 0 2, 4 3, 9 4, 6 MEDIAN 4, 4 3, 6 5, 8 6, 6 3. Quartile 6, 5 5, 2 7, 8 25, 5 14, 3 25, 5 12, 6 MAX Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 25 -
Summary n MHC: Multidimensional hierarchical clustering § Encoding for hierarchy paths, in order to support clustering multidimensional indexes § Support of star and snowflake schemata n Full implementation into Transbase® § Integration into the query processor (maintenance of compound surrogates) § Integration into the optimizer (interval generation, fact table access, hierarchical pre-grouping) n Significant speedup of performance: § Clustering vs. non-clustering organization: 2 -20 § Conventional grouping vs. hierarchical pre-grouping: 4 -7 Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 26 -
Questions ? ? n Everything clear? n Otherwise contact: Roland Pieringer Tel: 089/62709 -0 Transaction Software Gmb. H pieringer@transaction. de www. transaction. de Transbase® Hypercube © 2003 Transaction Software Gmb. H www. transaction. de BTW 2003 Feb 2003 - 27 -
- Slides: 27