1 Last update 17 October 2007 Advanced databases

  • Slides: 65
Download presentation
1 Last update: 17 October 2007 Advanced databases – Temporal and spatial databases I:

1 Last update: 17 October 2007 Advanced databases – Temporal and spatial databases I: Temporal databases Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/2007 w/adb/ Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 1

2 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage

2 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage and indexing Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 2

3 Where is temporal information important? Example 1 Berendt: Advanced databases, winter term 2007/08,

3 Where is temporal information important? Example 1 Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 3

4 Where is temporal information the key interest? Example 2 Berendt: Advanced databases, winter

4 Where is temporal information the key interest? Example 2 Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 4

5 Where is spatial (and temporal) information key? Example 3 . . . and

5 Where is spatial (and temporal) information key? Example 3 . . . and the bridge is closed to traffic 10 pm-6 am . . . and the shop is closed 8 pm 8 am Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 5

6 Further examples of spatial data and their uses Examples of Spatial data n

6 Further examples of spatial data and their uses Examples of Spatial data n Census Data n NASA satellites imagery - terabytes of data per day n Weather and Climate Data n Rivers, Farms, ecological impact n Medical Imaging Examples of user groups and questions: n Army Field Commander: Has there been any significant enemy troop movement since last night? n Insurance Risk Manager: Which homes are most likely to be affected in the next great flood on the Mississippi? n Medical Doctor: Based on this patient's MRI, have we treated somebody with a similar condition ? n Molecular Biologist: Is the topology of the amino acid biosynthesis gene in the genome found in any other sequence feature map in the database ? n Astronomer: Find all blue galaxies within 2 arcmin of quasars. Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 6

7 What exactly is interesting about time + space (in terms of data)? Fictitious

7 What exactly is interesting about time + space (in terms of data)? Fictitious example: n An environmental-protection organization has uncovered a major incident of fish dying in river R at location (L 1, L 2). n Assume they have access to a company directory and to companies‘ internal databases*. There has been frantic personnel reshuffling in companies. All Web reports of how companies in the area dispose of chemical waste look harmless. n Q 1: Which chemical companies are there in a 50 -mile radius around (L 1, L 2)? n Q 1‘: Which chemical companies are on river R, upstream of (L 1, L 2), at a distance of at most 50 miles? n Q 2: Who was working in the waste-disposal department of company C in the 3 months up to the incident? n Q 3: When were company C‘s statistics of chemicals discharge last updated? * Yes, this may happen! Cf. The Enron email dataset made public by the Federal Energy Regulatory Commission during its investigation: http: //www. isi. edu/~adibi/Enron. htm Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 7

8 Why is this a challenge for traditional database systems? n Assume the ORGANIZATION

8 Why is this a challenge for traditional database systems? n Assume the ORGANIZATION entity class has attributes LONGITUDE, LATITUDE. n Assume that the EMPLOYEE entity class has an attribute DEPARTMENT. n Assume that the DISCHARGE entity class has attributes CHEMICAL, AMOUNT_DISCHARGED_MONTHLY. n Q 1: Which chemical companies are there in a 50 -mile radius around (L 1, L 2)? n l The data are there; queries like „Where is company C located? “ are EASY. l But: Queries like Q 1 are INEFFICIENT to answer. Q 1‘: Which chemical companies are on river R, upstream of (L 1, L 2), at a distance of at most 50 miles? l The data are not there: – – n Also missing: topology („on river R“) Q 2: Who was working in the waste-disposal department of company C in the 3 months up to the incident? l n No notion of a „line-like“ spatial entity (river) with direction. The data may not be there because they have been overwritten when employees changed departments. Q 3: When were company C‘s statistics of chemicals discharge last updated? l The data are not there (we just have the tuples in DISCHARGE). Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 8

9 So what is needed? n Q 1: Which chemical companies are there in

9 So what is needed? n Q 1: Which chemical companies are there in a 50 -mile radius around (L 1, L 2)? l The data are there; queries like „Where is company C located? “ are EASY. l But: Queries like Q 1 are INEFFICIENT to answer. Different index structures (SPATIAL: #4) n Q 1‘: Which chemical companies are on river R, upstream of (L 1, L 2), at a distance of at most 50 miles? l The data are not there: – No notion of a „line-like“ spatial entity (river) with direction. – Also missing: topology („on river R“) Different representational concepts (SPATIAL: #3) n Q 2: Who was working in the waste-disposal department of company C in the 3 months up to the incident? The data may not be there because they have been overwritten when employees changed departments. Different representational concepts (TEMPORAL: #1) l n Q 3: When were company C‘s statistics of chemicals discharge last updated? l The data are not there (we just have the tuples in DISCHARGE). Different representational concepts (TEMPORAL: #2) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 9

10 Why in this course? Temporal and spatial data n are important in (nearly)

10 Why in this course? Temporal and spatial data n are important in (nearly) any kind of database or database application n are examples of complex and often also heterogeneous data Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 10

11 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage

11 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage and indexing Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 11

12 Time representation in temporal databases n Time = an ordered sequence of points

12 Time representation in temporal databases n Time = an ordered sequence of points in some granularity („chronon“) a „grid“ representation that makes events at the same chronon appear to be simultaneous n Calendar: organizes time into different time units for convenience n SQL 2 temporal datatypes: l DATE (YYYY-MM-DD) l TIME (HH: MM: SS) l TIMESTAMP (= date-time combination) l INTERVAL (= a relative time duration, e. g. , 10 days, 250 minutes) l PERIOD (an anchored time duration with a fixed starting point, e. g. the 10 days from Jan 1, 2007 to Jan 10, 2007 inclusive) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 12

13 Different kinds of events n Point events: l n Associated with a single

13 Different kinds of events n Point events: l n Associated with a single time point Duration events: l Associated with a time period (= represented by its start- and endpoint) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 13

14 Different kinds of „time“ in data 1. User time 2. Valid time Entities

14 Different kinds of „time“ in data 1. User time 2. Valid time Entities and relationships are associated with a date-time point at which a/the corresponding event happened in the modelled world l Ex. 1: person – birth date (that person‘s birth date) l Ex. 2: married. To – date (of that wedding) Attributes and relationships are associated with a datetime period during which a/the corresponding attribute relation was true in the modelled world l Ex. 1: person works. In department l Ex. 2: person – name l [Note: These attributes can themselves be of date-time types, e. g. , person – wedding. Anniversary (only valid as long as the person is married to the same spouse) ] Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 14

15 These kinds of „time“ in conceptual models 1. 2. birth. Date Entities and

15 These kinds of „time“ in conceptual models 1. 2. birth. Date Entities and relationships are associated with a date-time point at which a/the corresponding event happened in the modelled world l Ex. 1: person – birth date (that person‘s birth date) l Ex. 2: married. To – date (of that wedding) Attributes and relationships are associated with a date-time period during which a/the corresponding attribute relation was true in the modelled world l Ex. 1: person works. In department l Ex. 2: person – name ? Person married. To wedding. Date Person works. In Department start. Date end. Date name. Start. Date name Person name. End. Date Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 15

16 Different kinds of „time“ in data (contd. ) 3. Transaction time Statements about

16 Different kinds of „time“ in data (contd. ) 3. Transaction time Statements about entities and relationships are associated with a date-time point at which the statement started being true in the system, or with a period during which the statement was true in the system n Any of the above examples will do n But: this is not part of the modelled world not part of the usual conceptual model n (It should be part of a conceptual model which also represents the database or the databases that record information about the world originally modelled. ) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 16

17 Incorporating time in relational databases using tuple versioning n Convert relations into valid

17 Incorporating time in relational databases using tuple versioning n Convert relations into valid time relations by adding attributes „valid start time“ and „valid end time“ Ex. (Employees and departments) EMP_VT ( Name , Ssn , Salary , Dno , Supervisor_ssn , Vst , Vet ) DEPT_VT ( Dname , Dno , Total_sal , Manager_ssn , Vst , Vet ) n Each tuple in EMP_VT represents a version of an employee‘s information that is valid (in the modelled world) only during the time period [V. Vst, V. Vet] n The current version of each entity/relation being modelled in the tuple: Vet = special value now, a temporal variable Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 17

18 Operations in databases using tuple versioning n Update an attribute a to value

18 Operations in databases using tuple versioning n Update an attribute a to value new. Val. A, valid as of date : l Find the corresponding tuple V s. t. V. Vet = now l Create a new tuple V‘ with all values from V except V‘. Vst : = date and V‘. a : = new. Val. A l V. Vet : = date – V is now a closed or history version n Delete an employee l n n Close the current version Insert a new employee l Create the first tuple version for that employee l Make it the current version Note: Key must be the combination of a non-changing attribute (may be a surrogate key attribute) and Vst Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 18

19 Valid time databases, transaction time databases, bitemporal databases n Updates in valid time

19 Valid time databases, transaction time databases, bitemporal databases n Updates in valid time DBs may be l proactive (now < Vst of the new tuple) l simultaneous (now = Vst of the new tuple) l retroactive (now > Vst of the new tuple) è To record transaction time, record the timestamp of the transaction that applied the change (insert, delete, update) n Convert relations into transaction time relations by adding attributes „TA start time“ and „TA end time“ (aka rollback database) Ex. (Employees and departments) EMP_TT ( Name , Ssn , Salary , Dno , Supervisor_ssn , Tst , Tet ) DEPT_TT ( Dname , Dno , Total_sal , Manager_ssn , Tst , Tet ) l n Current version has special value Tet = uc (Until Changed) The corresponding bitemporal database schema is EMP_BT ( Name , Ssn , Salary , Dno , Supervisor_ssn , Vst , Vet , Tst , Tet ) DEPT_BT ( Dname , Dno , Total_sal , Manager_ssn , Vst , Vet , Tst , Tet ) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 19

20 Some problems A. One entity type can have attributes whose values cannot change

20 Some problems A. One entity type can have attributes whose values cannot change and attributes whose values can change (= have a valid-time period) l Ex. Person – blood group (cannot change) vs. name (can change) Versioning a whole tuple can lead to redundancies or anomalies B. Some attributions cannot overlap in time; others can l Ex. Cyclist – trains. In (cannot be in two different locations at the same time) vs. is. Trained. By (two trainer contracts may overlap) Valid start time + valid end time are not necessarily a key Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 20

One solution to problem A: Attribute versioning in object-oriented databases or object-relational systems n

One solution to problem A: Attribute versioning in object-oriented databases or object-relational systems n Distinguish time-varying from nontime-varying attributes n Change only attribute versions (for each attribute: <Vst, Vet, Tst, Tet, Value>) n Keep a lifespan temporal attribute for the whole object (>= 1 valid time periods) 21 Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 21

One solution to problem A: Attribute versioning in object-oriented databases or object-relational systems (Example

One solution to problem A: Attribute versioning in object-oriented databases or object-relational systems (Example ODL schema) Class TEMPORAL_DEPT { attribute Date Valid_start_time; attribute Date Valid_end_time; attribute float Dept; 22 Department is time-varying }; Class TEMPORAL_LIFESPAN { attribute Date Valid_start_time; attribute Date Valid_end_time; } Class EMPLOYEE_VT { extent { attribute list<TEMPORAL_LIFESPAN> lifespan; }; EMPLOYEES } attribute list<TEMPORAL_DEPT> Dept_history; attribute string Ssn; SSN is nontime-varying Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 22

23 Another solution to problem A: Make valid time refer to individual attribution statements

23 Another solution to problem A: Make valid time refer to individual attribution statements (whether attribute or relationship) RDF provides a built-in vocabulary intended for describing RDF statements. A description of a statement using this vocabulary is called a reification of the statement. Basic idea: Every RDF triple has the form „subject – predicate – object“. Then MOVE FROM Employee 1 works. In Department 10. Departm 10 works. In TO The statement „Employee 1 works. In Department 10“ has subject „Employee 1“, predicate „works. In“, object „Department 10“, and it is valid between Time 1 and Time 2. subject Statement 1 Employee 1 predicate works. In valid. Start. Time object Department 10 Time 1 valid. End. Time 2 Full details at http: //www. w 3. org/TR/rdf-primer/#reification Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 23

24 Outlook – A solution to problem B: Axiomatize time Represent constraints on time,

24 Outlook – A solution to problem B: Axiomatize time Represent constraints on time, for example n Start. Time <= end. Time n More explicitly and more extensibly: Statement is. Valid. In valid. Period & valid. Period has. Start. Time vst & valid. Period has. End. Time vet Axiomatize: For statements with a location predicate (e. g, „is. In“) l If there are two statements locating the same entity in different locations location 1 and location 2, then l EITHER location 1 is part. Of location 2 (or vice versa) l OR valid. Period 1 is. Disjoint. With valid. Period 2 l (This axiom may not hold for statements with other predicates) (More later in the course) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 24

25 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage

25 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage and indexing Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 25

26 Temporal querying constructs n Typical selection conditions in nontemporal databases: =, <, LIKE

26 Temporal querying constructs n Typical selection conditions in nontemporal databases: =, <, LIKE (~ match regular expression) n Typical selection conditions in temporal databases l Pure time conditions l Attribute and time conditions Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 26

27 Pure time conditions 1. Select all tuple versions T that were valid at

27 Pure time conditions 1. Select all tuple versions T that were valid at a time point T or during a period [T 1, T 2] Period = set of time points {T 1, . . . , T 2} use standard set comparison operations 2. Select. . . that were valid during a period that stands in a temporal (interval) relation to a given period [T. Vst, T. Vet] INCLUDES [T 1, T 2] T 1 ≥ T. Vst & T 2 ≤ T. Vet [T. Vst, T. Vet] INCLUDED_IN [T 1, T 2] (the inverse of INCLUDES) [T. Vst, T. Vet] OVERLAPS [T 1, T 2] T 1 ≤ T. Vet & T 2 ≥ T. Vst [T. Vst, T. Vet] BEFORE [T 1, T 2] T 1 ≥ T. Vet + 1 [T. Vst, T. Vet] MEETS_BEFORE [T 1, T 2] T 1 = T. Vet + 1 [T. Vst, T. Vet] AFTER [T 1, T 2] (the inverse of BEFORE) [T. Vst, T. Vet] MEETS_AFTER [T 1, T 2] (the inverse of MEETS_BEFORE) Note: These definitions from Elmasri & Navathe differ slightly from the standard interpretations of Allen (1986), where BEFORE and AFTER describe disjoint intervals, and I 1 MEETS I 2 iff I 1‘s endpoint equals I 2‘s startpoint Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 27

28 The 13 interval relations (aka „Allen relations“) Berendt: Advanced databases, winter term 2007/08,

28 The 13 interval relations (aka „Allen relations“) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 28

29 Pure time conditions and attribute and time conditions: Example Select all employee versions

29 Pure time conditions and attribute and time conditions: Example Select all employee versions that were valid at any point during the 3 months leading up to the environmental catastrophe (assume this was 2007 -01 -01 through 2007 -03 -31). Find all tuples T for which [T. Vst, T. Vet] OVERLAPS [2007 -01 -01, 2007 -03 -31] Find all tuples of employees that worked in the waste-disposal department W in that time: ([T. Vst, T. Vet] OVERLAPS [2007 -01 -01, 2007 -03 -31]) AND (T. Dept = „W“) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 29

30 Operations for manipulating time periods n Typical operations: union, intersection n Result may

30 Operations for manipulating time periods n Typical operations: union, intersection n Result may not be a period, but a temporal element TE n TE: A collection of ≥ 1 disjoint time periods, i. e. for all P 1, P 2 TE l P 1 P 2 = l P 2. start ≠ P 1. end + 1 (in the given granularity) l P 1. start ≠ P 2. end + 1 ensure unique operations of temporal elements: Adjacent and intersecting periods are coalesced into one Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 30

31 Querying transaction time n Typically, most temporal selections are applied to the valid

31 Querying transaction time n Typically, most temporal selections are applied to the valid time dimension. n In a bitemporal database, one usually applies the conditions to the currently correct tuples with uc as their Tet. n To query the valid time tuples that were correct in the database at time T: l append AS_OF T to the query Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 31

32 A brief overview of the TSQL 2 language Extension of SQL such that:

32 A brief overview of the TSQL 2 language Extension of SQL such that: CREATE TABLE has an optional AS clause with which a table can be specified to be temporal: n AS VALID STATE <GRANULARITY> l n AS VALID EVENT <GRANULARITY> l n transaction time relation with transaction time period AS VALID STATE <GRANULARITY> AND TRANSACTION l n valid time relation with valid time point AS TRANSACTION l n valid time relation with valid time period bitemporal relation, valid time period AS VALID EVENT <GRANULARITY> AND TRANSACTION l bitemporal relation, valid time point More information at http: //www. cs. arizona. edu/~rts/tsql 2. html Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 32

33 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage

33 Agenda Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Storage and indexing Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 33

34 Updating a bitemporal database – logical view Assume that Smith‘s department changes from

34 Updating a bitemporal database – logical view Assume that Smith‘s department changes from D 1 to D 2 as of 2007 -06 -01 on 2007 -10 -01 at 12: 05: 17: Previous current version: [ Smith , . . . , D 1 , 2002 -06 -15 , now , 2002 -06 -08, 00: 00 , uc ] Copy & change this current version to make it a closed version: [ Smith , . . . , D 1 , 2002 -06 -15 , 2007 -05 -31 , 2007 -10 -01, 12: 05: 17 , uc ] Copy & change the current version to incorporate the update: [ Smith , . . . , D 2 , 2007 -06 -01 , now , 2007 -10 -01, 12: 05: 17 , uc ] Change the current version to allow rollback: [ Smith , . . . , D 1 , 2002 -06 -15 , now , 2002 -06 -08, 00: 00 , 2007 -10 -01, 12: 05: 17 ] Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 34

35 Implementation considerations (1) n Storage options include l Store all tuples in the

35 Implementation considerations (1) n Storage options include l Store all tuples in the same table l Create 2 tables: 1 for the current versions, 1 for the rest l Vertically partition the attributes l Create separate relations to contain only those attributes that always change simultaneously (temporal normal form) – Avoids redundancies, but: – To combine the information, need temporal intersection join (expensive!) n Note: bitemporal databases allow a complete record of changes (including corrections) „append only database“ Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 35

36 Implementation considerations (2) The queries we discussed so far require partial persistence *:

36 Implementation considerations (2) The queries we discussed so far require partial persistence *: n Update the current version (getting a new version) n Query all versions * A data structure is called : l Ephemeral: updates create a new version and the old version cannot be queried l Persistent: updates can be applied at any version and any version can be queried l Partially Persistent: updates applied to the last version and any version can be queried Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 36

37 In the following: from TA time DBs to bitemporal DBs 1. Start by

37 In the following: from TA time DBs to bitemporal DBs 1. Start by describing one index structure for transaction time DBs l Basic ideas: – Model transaction time as 1 -dimensional (time point of TA) – tuples themselves are non-temporal/non-spatial and can therefore be indexed by a B-tree Store tuples+transaction times as multiple versions of a B-tree 2. Based on that, define an index structure for bitemporal DBs l Basic idea: – Model valid time as 2 -dimensional ([valid start time, valid end time]) Use a spatial index (R-tree) to manage this 2 -dimensional information l Retain the versioning approach to transaction-time management Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 37

38 Multiversion B-trees – The idea n Store all versions of the state of

38 Multiversion B-trees – The idea n Store all versions of the state of a B+-tree which evolved over time, i. e. multiple “snapshots” of the tree n avoid the inefficiency of the naïve approach of just duplicating all records n Inserts, updates and deletes are applied to the present version of the tree and increase the version number of the whole tree n Queries know which version of the tree they require the result(s) from n The MVBT addresses transaction time (different versions of a database) n Source: B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer, “On Optimal Multiversion Access Structures”, Proceedings of Symposium on Large Spatial Databases, pp. 123 -141, 1993. Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 38

39 General method Transform a single version external access structure (with high utilization of

39 General method Transform a single version external access structure (with high utilization of disk blocks) at the cost of a constant factor in time and space requirements compared to the original, single version structure Such increase is asymptotically optimal = worst-case bounds cannot decrease by adding multiversion capability to existing structure Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 39

40 Proposition Specifics Extend B+-tree to have multiversion capability Support operations: n Insert(key, info)

40 Proposition Specifics Extend B+-tree to have multiversion capability Support operations: n Insert(key, info) Insert into current version, increase tree version n Delete(key) Delete from current version, increase tree version n Exact match query(key, version) n Range query(lowkey, highkey, version) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 40

41 Overview The multiversion B-tree is a directed acyclic graph of B-tree nodes that

41 Overview The multiversion B-tree is a directed acyclic graph of B-tree nodes that results from incremental changes to original B-tree It has a number of B+-tree root nodes which partition the versions from the first to the present one so that each B-tree root stands for an interval of versions Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 41

42 Blocks Contain b data items Live if it has not been copied, dead

42 Blocks Contain b data items Live if it has not been copied, dead otherwise Weak version condition: for every version i and each block A except the roots of versions, we require that the number of entries in A is either 0 or at least d, where b=k·d Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 42

43 Data Items Leaf node of the tree n <key, in_version, del_version, info> Inner

43 Data Items Leaf node of the tree n <key, in_version, del_version, info> Inner node of the tree n <router, in_version, del_version, info> Said to be of version i if its lifespan contains i In live block, deletion version * denotes that this entry has not been deleted at present, in a dead block it means that the entry has not yet been deleted before the block died Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 43

44 Updates Each update creates new version If no structural changes: n Insert: lifespan

44 Updates Each update creates new version If no structural changes: n Insert: lifespan is [i, *) n Delete: changes del_version from * to i A structural change is required if: n Block overflow: can only fit b entries in a block n Weak version underflow: if deletion in a block with exactly d current version entries Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 44

45 Structural Modification Copy the block and remove all but the present version entries

45 Structural Modification Copy the block and remove all but the present version entries from the copy If block consists of primarily present version entries, the copy will produce an almost full block, resulting in a split again after a few subsequent insertions n To avoid this, request that at least εd+1 insert or delete operations are necessary for the next block overflow or version underflow in that block (ε will be defined later) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 45

46 Strong Version Constraints Strong version condition: the number of present version entries after

46 Strong Version Constraints Strong version condition: the number of present version entries after a version split must be in range from (1+ε)d to (kε)d. Strong version underflow: result of version split leading to less than (1+ε)d entries n Attempt to merge with a sibling block containing only its present version entries, if necessary followed by a version independent split by key values Strong version overflow: if a version split leads to more than (k -ε)d entries in the block n Also perform a split by key values Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 46

47 Simple Example Original Tree (Version 1) Insert(40) Berendt: Advanced databases, winter term 2007/08,

47 Simple Example Original Tree (Version 1) Insert(40) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 47

48 Simple Example (Version 2) Delete(65) Berendt: Advanced databases, winter term 2007/08, http: //www.

48 Simple Example (Version 2) Delete(65) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 48

49 Simple Example (Version 3) Berendt: Advanced databases, winter term 2007/08, http: //www. cs.

49 Simple Example (Version 3) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 49

50 Version Split Example Insert(5) creates a block overflow All currently live entries copied

50 Version Split Example Insert(5) creates a block overflow All currently live entries copied to the new live block A*, old block A marked dead (Version 7) Also, the root block is updated to show that entity 10 was alive in the dead block A until version 8 Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 50

51 Version Split Example Resulting tree (Version 8) Berendt: Advanced databases, winter term 2007/08,

51 Version Split Example Resulting tree (Version 8) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 51

52 Weak Version Underflow Example Suppose b=6, d=2, ε=0. 5 (d: the minimum #

52 Weak Version Underflow Example Suppose b=6, d=2, ε=0. 5 (d: the minimum # of current version entries in the block) Delete(40) results in block A only having 1 current entry 1<d: weak underflow, split A (Version 7) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 52

53 Strong Version Underflow Example The version split of A has led to less

53 Strong Version Underflow Example The version split of A has led to less than (1+ ε)d=1. 5*2=3 entries in the new node strong version underflow Seek a sibling of A* (in our case, B) Version split it (to create B*) Merge B* and A* to produce block A*B* (Processing: Version 7 8) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 53

54 Strong Version Underflow Condition Violation But now, the node A*B* violates the strong

54 Strong Version Underflow Condition Violation But now, the node A*B* violates the strong version overflow condition and must be split by key into nodes C and D (Processing: Version 7 8) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 54

55 Resulting Tree Example Query: (25, 5) (Version 8) Berendt: Advanced databases, winter term

55 Resulting Tree Example Query: (25, 5) (Version 8) Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 55

56 Roots Can Split, Too Overflow Split Berendt: Advanced databases, winter term 2007/08, http:

56 Roots Can Split, Too Overflow Split Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 56

57 Roots Can Split, Too Strong version overflow with key split and allocation of

57 Roots Can Split, Too Strong version overflow with key split and allocation of the new block Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 57

58 Weak Version Underflow of the Root Node R 3 has shrunk weak underflow

58 Weak Version Underflow of the Root Node R 3 has shrunk weak underflow Block copies of R 3 and R 4 are created and merged into R 5 This causes weak version underflow of R 2, so R 5 becomes new root block Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 58

59 Algorithms Insertion: Find the leaf node for the new key e, then call

59 Algorithms Insertion: Find the leaf node for the new key e, then call block. Insert (say A) block. Insert: enter e in A n If block overflow of A then l Version split, block insert l If strong version underflow then – Merge l Else if strong version overflow key split Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 59

60 Algorithms Delete: block. Delete A n Check weak version underflow on A n

60 Algorithms Delete: block. Delete A n Check weak version underflow on A n If true, then merge with sibling Note that Deletion is easier than the insertion in the MVBT. What about the B+-tree? Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 60

61 Constraints on MVBT Parameters What are the restrictions on choices of k and

61 Constraints on MVBT Parameters What are the restrictions on choices of k and ε? n n (k-ε)d+1 ≥ (1/α)(1+ε)d l α is the fraction of the entries in a block guaranteed to be in a new node after a key split, 0. 5 for B-trees l Before key split, A contains at least (k-ε)d+1 entries l After a key split, both blocks must contain at least (1+ε)d entries. 2 d-1 ≥ (1+ε)d l Before merge is performed, togethere at least 2 d-1 present version entries in the blocks to be merged Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 61

62 Efficiency Analysis The big result is that the MVBT is asymptotically optimal to

62 Efficiency Analysis The big result is that the MVBT is asymptotically optimal to the B-Tree in the worst case in time and space for all considered operations Search time is in Space is O(n/b) and update Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 62

63 Outlook: Bitemporal Data Indexing Bitemporal Index: n R-tree for valid time n Partial

63 Outlook: Bitemporal Data Indexing Bitemporal Index: n R-tree for valid time n Partial persistence for transaction time Partially persistent R-tree n A. Kumar, V. J. Tsotras, and C. Faloutsos. Designing access methods for bitemporal databases. IEEE Trans. Knowledge and Data Engineering, 10(1): 1 – 20, 1998. What about the case that valid time has an open end_time (now)? n A special R-tree to handle that n Simonas Saltenis & Christian S. Jensen. Indexing of now-relative spatiobitemporal data. The VLDB Journal, Volume 11, Number 1, pp 1 -16, August 2002. For an overview of indexes, see n Marios Hadjieleftheriou, George Kollios, Vassilis J. Tsotras, & Dimitrios Gunopulos. Efficient Indexing of Spatiotemporal Objects. In C. S. Jensen et al. (Eds. ): EDBT 2002, Springer, LNCS 2287, pp. 251– 268, 2002. Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 63

64 Next lecture Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries

64 Next lecture Motivation: Why space and time? Time: Conceptual modelling issues Temporal queries Spatial databases Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 64

65 References / background reading & acknowledgements Slides on “Time: Conceptual modelling issues“ and

65 References / background reading & acknowledgements Slides on “Time: Conceptual modelling issues“ and „Temporal queries“: Elmasri, R. & Navathe, S. B. (2007). Fundamentals of Database Systems. Boston: Addison Wesley. 5 th Edition. p. 28: J. F. Allen: Maintaining knowledge about temporal intervals. Communications of the ACM. 26/11/1983. 832 -843. Slides 38 -62: from Multiversion Access Methods - Temporal Indexing. http: //www. corelab. ntua. gr/courses/ds. grad/persistent. ppt Berendt: Advanced databases, winter term 2007/08, http: //www. cs. kuleuven. ac. be/~berendt/teaching/2007 w/adb/ 65