Performance Analysis of Temporal Queries Information Sciences 49
































- Slides: 32
Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept. of Computer Science, University of Arizona Communicated by Ahmed Elmagarmid ~ * ~ Presented by Barry Klein for CS-599, 10/26/2000
Abstract Temporal databases that maintain history data add historical queries and rollback operations to conventional db’s. This paper proposes a model for analyzing the performance of temporal queries over a range of access methods.
Abstract, continued n Model: 4 transformations through a series of formal expressions common to all phases of query processing. Input: Temporal Query + DB schema n Output: Estimated I/O cost for it. n Validation: Compare estimated cost from model n with actual cost from a prototype.
Introduction Factors affecting performance of a Temporal DBMS: Access methods available n Query-processing strategies n Size and composition of the data n
Introduction, continued Methods for describing TDBMS effectiveness: n Empirical approach – actual performance is measured. Advantage: Results are reliable. n Analytical approach – develop a math model of the performance, which can predict performance in controlled context. Advantage: less effort, but results are questionable.
Introduction, continued Three orthogonal types of Time: n Valid time, Transaction time, User-defined 4 categories of DBs defined in terms of support for valid/transaction time: n n Snapshot – conventional, no temporal support. Rollback – support transaction time. Historical – support valid time (real-world history) Temporal – support both valid and txn time.
Introduction, continued TQuel—non-procedural language based on tuple calculus—is chosen here to express historical queries and rollback operations: n n n Augments retrieve statement with when predicate – temporal relations among tuples. Valid clause specifies how implicit time attributes are computed for result tuples. Rollback operations implemented with as of clause (in either rollback or temporal db’s).
Introduction, continued Temporal relations used with the added constructs: n Precede n Overlap n Extend n Begin of n End of TQuel augments valid and when clauses to: n Append n Delete n Replace Create statement supported for temporal relations.
The New Model Performance analysis based on these givens: n n A set of temporal queries Some query-processing/optimization strategy File structure(s) to implement the TDB A set of parameters characterizing the storage devices.
The New Model, continued Assumptions and decisions for this model: n Disk I/O traffic is used as measurement key: ~proportional to performance; n Inputs must be flexible; n Resulting estimate must be accurate.
The New Model, continued The 4 transformations of the model use: n Algebraic expressions; n File-primitive expressions; n Access-path expressions.
The Algebraic Expression Since TQuel is non-procedural, the algebraic expression is defined first: n Algebraic operators Conventional: select, project, join, union, difference u Temporal: when, as of u Auxiliary: temporary, sort, reformat u
Conventional Algebraic Operators n Select – has a relations and a predicate to specify constraint that result tuples must satisfy. n Project – parameters are a relation and a set of attributes to be extracted from the relation. n Join – performs a theta-join of 2 relations, given the first 2 parameters; 3 rd parm is join method, 4 th is combining-method predicate. Union – set addition on 2 relations. n Difference – set subtraction on 2 relations. n
Temporal Algebraic Operators n When – performs temporal selection on a relation according to a temporal predicate on the values of valid time attributes. n As. Of – similar, but compares 2 time constants with transaction-time attribute values. n Valid – performs temporal projection the values of the valid time attributes. (It might perform similarly to project. )
Auxiliary Algebraic Operators Operations that don’t change the query result but affect the query cost. n Temporary – create or access a temporary relation for the result of its parameter’s operation. n Sort – tuples in the rel sorted by 1 st parm, with remaining parms as key sort attributes. n Reformat – changes the structure of the relation 1 st parm, to form of 2 nd parm , with remaining parms as key sort attributes.
TQuel Algebraic xform’s: Example 1 range of h is relation_h retriev (h, id, h. seq) where h. id = 500 is mapped to: {L 1: Select (h, h. id=500); Project (L 1, h. id, h. seq) } Selects id=500 from rel_h, then extracts attribs id & seq from L 1, the result of the previous operation. The “; ” forces sequential execution.
Example 1, continued The same expression can be mapped instead to: {[ L 1: Select (h, h. id=500); Project (L 1, h. id, h. seq) ]} The “[]” eliminates need for temporary file for intermediate results.
TQuel Algebraic xform’s: Example 2 {L 1: Join (h, I, TS, h. id= range of h is relation_h i. amount and h overlap i); range of i is relation_i Project (L 1, h. id, h. seq) } retriev (h. id, id. amount) L 2: when (L 1, i overlap “now”); where h. id = id. amount Project (L 2, h. id, i. id, when h overlap i and i i. amount) } overlap “now” is mapped to 2 different algebraic expressions: Specifies Join using tuple substitution (TS) of rel’s h & i.
Example 2, continued {[ L 1: When (i, i overlap now”); L 2: Project (L 1, i. id, range of h is relation_h i. amount, i. valid_from, range of i is relation_i i. valid_to) ]} retriev (h. id, id. amount) L 3: Temporary (L 2); where h. id = id. amount [L 4: Join (h, L 3, TS, h. id= when h overlap i and i i. amount and h overlap I); overlap “now” Project (L 4, h. id, i. amount) ]} The original expression: is also mapped to: Equivalent to prev example, but performs much more efficiently
Xform to File Primitive Expression The 2 primitives, Read and Write, take parms: Access method - Heap, Hash, Isam or Btree; n File size n Length of overflow chain An FPE combines primitives to repeat or execute together to perform an algebraic operation. n The simple example FPE-1: Read (Hash, 0) specifies one hashed access with no overflow records.
File Primitive Expression, example 2 FPE-2: Read (Heap, 128) + ( Read (Heap, 19) * 2 - 1 + Write (Heap, 19) * 3 - 1 ) Read (Heap, 19) + Read (Hash, 0) * 1024 + This indicates one Read from the 128 -block heap, 2 Read s from 19 blocks, 3 Writes to the 19 -block heap, and a hashed access on a file with no overflow records, iterated 1024 times.
Characteristics of DB Relations Transforming alg expressions to FPE, need: n n n n n Relation names Temporal type Storage structures Attribute counts, names, formats, lengths Key attributes Tuple lengths and counts Selectivity & distribution of attribute values Data volatility Update count (particularly for TDB)
Steps of Transformation n n For each algebraic operator, substitute file primitive(s) with the particular DB parameters. Omit any algebraic operation that can be performed simultaneously with another operation. Identify basic constructs in temporal queries. Transform the subset of algebraic expressions (composed of these constructs) to FPEs.
Access Path Expression APE: the path through the storage structure which satisfies an FPE access request. Node: physically contiguous record(s) involved in the access. Access (read or write) of a tuple: traverses node(s). Access path: a set of nodes connected (in)directly; also, set of chains. Chain: a group of nodes.
Access Path Expression Modes n Guided if there’s a random-access location mechanism: H: address is computed by a hash function; u P: there’s a pointer to the address; u A: component follows adjacently; u S: component shares starting address with its parent; u M: the component is in main memory. u n Searched otherwise: O: file is ordered, enabling log search; u U: unordered - requires sequential search. u
APE Subcomponent Parameters f = number of records in a file n b = number of records in a block n r = number of bytes in a record n n = number of records to be accessed. n
Inverted & Multi-List File structures
APE for Inverted Files Read (Inverted, 3): (P 3 (P 1 (S 1))) The head of the path is located by a pointer; it contains a key value and 3 chains, each of which is also located by a ptr; each has one node, which shares the same address with the chain, and contains one record. The expression abbreviates: (P 3 (P 1 (S 1))
APE for Multilist Files Read (Multilist, 3): (P 1 (P 3 (S 1) (P 1))) The head of the path is located by a pointer; it contains 1 chain which is also located by a ptr, and has 3 nodes, each of which contains one record. The first node has the same address as the chain, and next nodes via pointers. Since the 2 nd and 3 rd nodes are identical, the expression abbreviates: (P 1 (P 3 (S 1) (P 1)))
Transform FPE Access Cost n n Parse the APE and determine the access cost in terms of the random and the sequential access counts. The avg access count for each component est’ed re the component-location mode (see above) The total access count for an APE = of all its components, each multiplied by the corresponding value of count. Ex: the APE (H 1 (P 28 (S 1) (P 1))) has a random access count of 1 + 28 (0+1) = 29
Access-Time Calculations The time elapsed to access disk blocks requires modeling the characteristics of storage devices. Some of the criteria are: n n n Type of media Fixed or moving heads R/w or write-once Seek time and transfer rate Number and size of cylinders, tracks and sectors Block size of DBMS vs page size of op system.
Performance Analysis Summary The steps are: n n n n Examine TQuel query to decide processing strategy Transform it into an algebraic expression Break down in terms of characteristics of DB/rel’s Transform into FPE, and then into APE Analyze for characteristics of storage devices Compute I/O costs Select and execute a validation method