How to effectively store the history of data

  • Slides: 26
Download presentation
How to effectively store the history of data in a relational DBMS Database systems

How to effectively store the history of data in a relational DBMS Database systems MSE-Seminar 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 1

Agenda • Time – Definition – Storing within a DMBS – General problems •

Agenda • Time – Definition – Storing within a DMBS – General problems • History – Motivation – On Data warehouses, Online Analytical Processing system, OLAP systems • Common patterns available: slowly changing dimensions, SDC – On Online transaction Processing system, OLTP systems • No common patterns available • Analyzed data models – Performed tests – Results – Conclusion • Questions • References 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 2

Time is defined as one second: “the duration of 9 192 631 770 periods

Time is defined as one second: “the duration of 9 192 631 770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. ” 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 3

Storing within a DMBS • a date is represented by an offset with a

Storing within a DMBS • a date is represented by an offset with a defined accuracy on a reference point • an interval is represented by a value with a defined accuracy • a duration – is represented by a composite value of two dates or – a date value and an interval value 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 4

Storing within a DMBS  Example Value to store: 2008 -12 -02 10: 00

Storing within a DMBS Example Value to store: 2008 -12 -02 10: 00 Used date type: smalldatetime (SQL 2), resolution 1 min, reference point: 1900 -01 -01, based on the Gregorian calendar Internal stored Value: 57286680 („Value to store“ - reference point [min]) 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 5

Storing within a DMBS  Date Types Available date types on Microsoft SQL Server

Storing within a DMBS Date Types Available date types on Microsoft SQL Server 2008 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 6

Time  General Problems • • Different time zones Different implementation of data types

Time General Problems • • Different time zones Different implementation of data types Different calendars Time synchronization – local – networked • Summer, winter time problematic • Choosing the date type – Resolution is to small – Resolution is to high – Range is to small 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 7

History  Motivation Why do we need to know the history of our data?

History Motivation Why do we need to know the history of our data? For example for: • Legal requirements – A bank has to know at each time what the exact balance of the customer was – A internet provider has to be able to store all traffic from a user over a given time • Business requirements – A version control has to be able to manage multiple revisions of the same unit of information – A customer relationship management (CRM) software has to be able to present the volume of sales of a costumer over time • Entertainment requirements – A chat program has to be able to present the conversation between two person over time • Other requirements – The “Time Machine” function in the Mac OS X has to be able to go back in time for locating older version of your files 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 8

Main approaches to store data Data warehouses (Online Analytical Processing system, OLAP) • Designed

Main approaches to store data Data warehouses (Online Analytical Processing system, OLAP) • Designed for – Reporting – Analysis – Speed of data retrieval • Uses to following approaches – data are stored denormalised based on a dimension-based model (logical data grouped together) • Include often business intelligence tools to retrieve and analyze data • History – Common pattern exits for storing historical data (Slow Changing Dimensions (SCD)) Online transaction Processing system, OLTP • Designed for – Perform day-to-day transaction processing – Preservation of data integrity – Speed of recording of business transactions • Uses to following approaches – Database normalization • Codd rules of data normalization – Entity-relationship model • History – 14. 12. 2008 No common pattern available for storing historical data © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 9

History  Slow Changing Dimensions [0/2] Type 0: an attribute of a dimension is

History Slow Changing Dimensions [0/2] Type 0: an attribute of a dimension is fixed, no history available, is not frequently used yet Type 1: overwrites the old data with the new data, no history available Type 2: tracks historical data by creating multiple records with a separated key, unlimited history is possible Type 3: additional columns in the tables track changes, limited history is available Type 4: creates separate historical tables that stores the historical data Type 6: is a hybrid approach that combines SCD 1, 2 and 3 , unlimited history is possible, is not frequently used yet Type 1, 2 and 3 are the most common 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 10

History  Slow Changing Dimensions [1/2] SCD 1, overwrites the old data with the

History Slow Changing Dimensions [1/2] SCD 1, overwrites the old data with the new data, no history available Results in SCD 2, tracks historical data by creating multiple records with a separated key, unlimited history is possible 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 11

History  Slow Changing Dimensions [2/2] SCD 3, Additional columns in the tables track

History Slow Changing Dimensions [2/2] SCD 3, Additional columns in the tables track changes, limited history is available SCD 4, creates separate historical tables that stores the historical data 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 12

History  On. Line Transaction Processing system, OLTP No common patterns available for storing

History On. Line Transaction Processing system, OLTP No common patterns available for storing historical data Common used rational data models are analyzed (all based on SCD type 2 (unlimited history is possible)) – – Method “Duplication” Method “Transaction” Method “Linked history items” Method “Bidirectional linked history items” Based on the following rational data model: 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 13

History  OLTP  Duplication Method “Duplication” Focused on • Fast access of historical

History OLTP Duplication Method “Duplication” Focused on • Fast access of historical data • Easy implementation • Data integrity ID Name Salary ID_Change ID_Company ID Date. Time 1 Gfeller Raphael 1000 1 2008 -12 -2 1 Gfeller Raphael 2500 2 1 2 2008 -12 -3 2 Hans Meier 500 1 1 2 Hans Meier 550 2 1 3 Fritz Müller 1750 2 2 14. 12. 2008 ID Name ID_Change 1 UBS 1 2 HSR 1 1 UBS 2 2 HSR 2 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 14

History  OLTP  Transaction Method “Transaction” Focused on • Less used data storage

History OLTP Transaction Method “Transaction” Focused on • Less used data storage • Precious information about history at every point on time ID Name Salary ID_Company 1 Gfeller Raphael 2500 1 2 Hans Meier 500 1 3 Fritz Müller 1750 2 ID Name 2 HSR 1 UBS 14. 12. 2008 ID Date. Time Entry ID New. String. Value Action 2 2008 -12 -2 1 Gfeller Raphael Person. Set. Name 1 2008 -12 -1 1 Gfeller Raphal, 2500, 1 Person. Create © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 15

History  OLTP  Linked history items Method “Linked history items” Focused on •

History OLTP Linked history items Method “Linked history items” Focused on • Avoid huge changes to the underlying database • Easy to implement • Fast insertion of new entries ID Name Salary ID_Company FK_Old_ID 1 Gfeller Raphael 2500 1 5 2 Hans Meier 500 1 NULL 3 Gfeller Raphael 1500 1 NULL 4 Gfeller Raphael 1000 2 3 5 Fritz Müller 1750 2 NULL 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 16

History  OLTP  Bidirectional Linked history items Method “Bidirectional Linked history items” Focused

History OLTP Bidirectional Linked history items Method “Bidirectional Linked history items” Focused on • Avoid huge changes to the underlying database • Extendibility by adding additional metadata to the separated table • Fast insertion of new entries • Providing additional backward and forward navigation Old Person New Person Date. Time User 3 4 2008 -12 -3 User 1 4 1 2008 -12 -4 User 2 14. 12. 2008 ID Name Salary ID_Company 1 Gfeller Raphael 2500 1 2 Hans Meier 500 1 3 Gfeller Raphael 1500 1 4 Gfeller Raphael 1000 2 5 Fritz Müller 1750 2 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 17

History  OLTP  Analyzed criteria’s Analyzing criteria's • Insert an entry • Updating

History OLTP Analyzed criteria’s Analyzing criteria's • Insert an entry • Updating an entry • Storage cost • Get an entry at (Time – 1) • Get en entry at (Time – n) • Entry at time x • Get an integrity state over all entries • Get the next entry by a entry at the past • Get the previous entry by a entry at the past • A person by a company at the past 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 18

History  OLTP  Theory 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch

History OLTP Theory 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 19

History  OLTP  Test environment CPU: Intel Core 2, 2 Ghz Memory: 2

History OLTP Test environment CPU: Intel Core 2, 2 Ghz Memory: 2 Gb Operating System: Windows XP, Sp 3, Database: Microsoft SQL Server 2005, Express Edition with SP 1, Benchmark written in C# Benchmark input • Count inserted companies • Count inserted persons • Count companies to change • Count persons to change Benchmark steps 0. Insert companies 1. Insert persons 2. Change companies 3. Change persons 4. Find a person by its parent person 5. Collect all persons and companies that are valid at a specific time. 6. Find a person in the past by a datetime value 7. Find a person by a company by a datetime value. 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 20

History  OLTP  Results The measurements confirm theory Possible optimizations • Method Change

History OLTP Results The measurements confirm theory Possible optimizations • Method Change Set based on Duplication – Only changed entries are duplicated acceptable overhead in reading fewer data storage used • Method “Transaction with anchors “ – Using “anchors transaction”, they resave the entire state of the entries Fewer network traffic Restoring an entry becomes linear O(max. Changes. Between. Two. Anchors) instead of O(n. Changes) 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 21

History  OLTP  Result  Conclusion Advices based on these tests • If

History OLTP Result Conclusion Advices based on these tests • If storage is limited use the methods in the following order: – – – • If network bandwidth is limited, use the methods in the following order: – – – • transaction mechanism Linked history items Bidirectional linked history items Transaction with anchors Change Set based on Duplication Linked history items Bidirectional Linked history items Transaction with anchors transaction mechanism If the knowledge of the developers is low • use either method duplication or Linked history items 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 22

History  OLTP  Result  Conclusion Advices based on these tests • If

History OLTP Result Conclusion Advices based on these tests • If data volume is high, use the methods in the following order: – – – • transaction mechanism Linked history items Bidirectional linked history items Transaction with anchors Change Set based on Duplication If change frequency of the data is high, use the methods in the following order: – – – Transaction with anchors transaction mechanism Linked history items Bidirectional linked history items Change Set based on Duplication 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 23

History  OLTP  Result  Conclusion Practical example of the usage of the

History OLTP Result Conclusion Practical example of the usage of the methods 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 24

Questions? 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 25

Questions? 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 25

References • Gfeller Raphael, How to effectively store the history of data in a

References • Gfeller Raphael, How to effectively store the history of data in a ration DBMS, [Online] 2. 12. 2008. [Citied 2. 12. 2009] http: //wiki. hsr. ch/Datenbanken/wiki. cgi? Historie. Und. Zeit. In. Datenbanken. 14. 12. 2008 © Raphael Gfeller, Raphael. Gfeller@sunrise. ch 26