The need for Temporal Databases 1 Need for
The need for Temporal Databases (1) Need for coping with temporal data (both VT and TT) (2) Just adding 1 (or 2, or 4) temporal attributes (and maybe some ad-hoc procedures) does not work! (3) First, a rigorous semantic framework is needed, to give formal specification to the implementation. Properties: clearness, expressiveness, upward compatibiliy. Ex. BCDM (4) Second, the implementation must be proven to respect the semantics. Core issue here: efficient (1 -NF) implementations hardly grant uniqueness of representation. Ex TSQL 2
Ad-hoc approaches are complex and are not likely to work Example 1. Projection (and temporal coalescing) Name Salary Title VT_start VT_end Andrea 60000 Ass. Provost 1/1/1993 30/5/1993 Andrea 70000 Ass. Provost 1/6/1993 30/9/1993 Andrea 70000 Provost 1/10/1993 31/1/1994 Andrea 70000 Professor 1/2/1994 Question: salary history of Andrea “Intuitive” SQL query: SELECT Salary, VT_start, VT_end FROM Employee WHERE Name=“Andrea” 31/12/1994
Ad-hoc approaches are complex and are not likely to work Result obtained: Desired result: Salary VT_start VT_end 60000 1/1/1993 30/5/1993 70000 1/6/1993 30/9/1993 70000 1/10/1993 31/1/1994 70000 1/2/1994 31/12/1994 Salary VT_start VT_end 60000 1/1/1993 30/5/1993 70000 1/6/1993 31/12/1994
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL 92: CREATE TABLE Temp(Salary, VT_start, VT_end) AS SELECT Salary, VT_start, VT_end FROM Employee WHERE Name = “Andrea”; Repeat UPDATE Temp T 1 SET (T 1. VT_end)=SELECT MAX(T 2. VT_end) WHERE T 1. Salary=T 2. Salary AND T 1. VT_start < T 2. VT_Start AND T 1. VT_end >= T 2. VT_start AND T 1. VT_end < T 2. VT_end WHERE EXISTS (SELECT * FROM Temp AS T 2 WHERE T 1. Salary=T 2. Salary AND T 1. VT_start < T 2. VT_Start AND T 1. VT_end >= T 2. VT_start AND T 1. VT_end < T 2. VT_end) Until no tuples updated
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL 92 (continues!): DELETE FROM Temp T 1 WHERE EXISTS (SELECT * FROM Temp AS T 2 WHERE T 1. Salary = T 2. Salary AND ((T 1. VT_start > T 2. VT_Start) AND (T 1. VT_end <= T 2. VT_end)) OR ((T 1. VT_start >= T 2. VT_Start) AND (T 1. VT_end < T 2. VT_end))
Ad-hoc approaches are complex and are not likely to work Underlying semantic phenomenon: Projection on temporal relations involves temporal coalescing about value equivalent tuples When it occurs (SQL): Whenever a proper subset of the attributes of the relations is chosen in the SELECT part of the query
Ad-hoc approaches are complex and are not likely to work How to get the desired result using a Temporal DB (ex. TSQL 2) SELECT Salary FROM Employee WHERE Name = “Andrea”
Ad-hoc approaches are complex and are not likely to work Example 2. Join (and temporal intersection) Employee 1 Employee 2 Name Salary VT_start VT_end Andrea 60000 1/1/1993 30/5/1993 Andrea 70000 1/6/1993 31/12/1994 Name Title VT_start VT_end Andrea Ass. Provost 1/1/1993 30/9/1993 Andrea Provost 1/10/1993 31/1/1994 Andrea Professor 1/2/1994 31/12/1994 Query: “combined” history of both Andrea’s salary and title
Ad-hoc approaches are complex and are not likely to work “Intuitive” SQL query: SELECT Salary, Title, Emp 1. VT_start, Emp 1. VT_end Emp 2. VT_start, Emp 2. VT_end FROM Employee 1, Employee 2 WHERE Employee 1. Name=“Andrea” AND Employee 1. Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work Result obtained: Salary Emp 1. VT_start Emp 1. VT_end Title Emp 2. VT_start Emp 2. VT_end 60000 1/1/1993 30/5/1993 Ass. Provost 1/1/1993 30/9/1993 60000 1/1/1993 30/5/1993 Provost 1/10/1993 31/1/1994 60000 1/1/1993 30/5/1993 Professor 1/2/1994 31/12/1994 70000 1/6/1993 31/12/1994 Ass. Provost 1/1/1993 30/9/1993 70000 1/6/1993 31/12/1994 Provost 1/10/1993 31/1/1994 70000 1/6/1993 31/12/1994 Professor 1/2/1994 31/12/1994
Ad-hoc approaches are complex and are not likely to work Result desired: Salary Title VT_start VT_end 60000 Ass. Provost 1/1/1993 30/5/1993 70000 Ass. Provost 1/6/1993 30/9/1993 70000 Provost 1/10/1993 31/1/1994 70000 Professor 1/2/1994 31/12/1994
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL 92: SELECT Employee 1. Name, Salary, Dept, Employee 1. VT_start, Employee 1. VT_end FROM Employee 1, Employee 2 WHERE Employee 1. Name=Employee 2. Name AND Employee 2. VT_start <= Employee 1. VT_start AND Employee 1. VT_end < Employee 2. VT_end UNION SELECT Employee 1. Name, Salary, Dept, Employee 1. VT_start, Employee 2. VT_end FROM Employee 1, Employee 2 WHERE Employee 1. Name=Employee 2. Name AND Employee 1. VT_start >= Employee 2. VT_start AND Employee 2. VT_end < Employee 1. VT_end AND Employye 1. VT_start < Employee 2. VT_end UNION SELECT Employee 1. Name, Salary, Dept, Employee 2. VT_start, Employee 1. VT_end FROM Employee 1, Employee 2 WHERE Employee 1. Name=Employee 2. Name AND Employee 2. VT_start > Employee 2. VT_start AND Employee 1. VT_end < Employee 2. VT_end AND Employye 2. VT_start < Employee 1. VT_end UNION SELECT Employee 1. Name, Salary, Dept, Employee 2. VT_start, Employee 2. VT_end FROM Employee 1, Employee 2 WHERE Employee 1. Name=Employee 2. Name AND Employee 2. VT_start > Employee 1. VT_start AND Employee 2. VT_end < Employee 1. VT_end
Ad-hoc approaches are complex and are not likely to work Underlying semantic phenomenon: Join (Cartesian product) on temporal relations involves temporal intersection When it occurs (SQL): Whenever more than one relation is used in the FROM part of the query Note: the number of terms in the SQL union is 2 n, where n is the number of relations in the FROM part
Ad-hoc approaches are complex and are not likely to work How to get the desired result using a Temporal DB (ex. TSQL 2) SELECT Salary, Title FROM Employee 1, Employee 2 WHERE Employee 1. Name=“Andrea” AND Employee 1. Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work Until now, just two simple examples concerning: - SELECT a subset of attributes ( loop to do colaescing) - FROM with >1 relations (exponential union to do intersection) And what about: Union, difference, …, nested queries Temporal predicates Primarysecondary keys Aggregate functions Integrity constraints Multiple (user-defined!) granularities …… arbitrary combinations of all the above issues ?
Ad-hoc approaches are complex and are not likely to work Key message: Dealing with temporal data is a general problem in DB’s Difficult problems (often “hidden” ones) have to be faced WHY? (informally) (1) Relational model assumes independence of attribute values Time attributes predicates about when the values of the other attributes hold in the modeled reality (VT) or are current in the DB (TT)
Ad-hoc approaches are complex and are not likely to work WHY? (informally) (2) Relational model assumes all dataknowledge is explicit The association of time to tuples involves “semantic assumptions” Such assumptions are part of our “commonsense” way of looking at time. However, they are NOT supported by the relational model (as long as they are left implicit) Making the semantic of time explicit is a complex task (everywhere, not only in DBs)
Ad-hoc approaches are complex and are not likely to work From a Software Engeneering point of view: Letting applications solve the problem in an ad-hoc way is: Both cost and time expensive Likely to lead to errors in the applications Likely to make integration (shared data) between applications impossible
Ad-hoc approaches are complex and are not likely to work Temporal DB: an area of research aiming at providing once-and-forall principled and integrated solution to the problem
- Slides: 19