Enhancements to the Bitemporal Model Support Integrity Constraints

Enhancements to the Bitemporal Model Support: Integrity Constraints and More Henrietta Dombrovskaya, Braviant Holdings, Chicago Boris Novikov, Saint Petersburg University, Russia Chad Slaughter, Scale Genius, Inc. Chicago

Bitemporal Data: Making It Happen In Postgres Incorporating time, thereby making the data models temporal, is a hot topic. The practical usage include: • Snapshot queries (This report looked differently last week! I have a print out! Can you run it as it looked at the end of November? ) • Time-interval queries (Why this query takes twice longer now than last month? What exactly changed in the data? ) • Correlated time queries (How many people where fired since X became a CTO? How many people left since the new head of Analytics started? ) Many systems are already providing this kind of support (DB 2, Oracle) and it is included into the SQL ANSI 2011 standard. However, Postgres never formally included the time dimensions, and we are going to fix this! Dec-20 Bitemporal Data Model - PG Open 2017 2

References • C. J. Date, Hugh Darwen, and Nikos Lorentzos. 2014. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL (2 nd ed. ). Morgan Kaufmann Publishers Inc. , San Francisco, CA, USA. • Tom Johnston. 2014. Bitemporal Data: Theory and Practice (1 st ed. ). Morgan Kaufmann Publishers Inc. , San Francisco, CA, USA. • Tom Johnston and Randall Weis. 2010. Managing Time in Relational Databases: How to Design, Update and Query Temporal Data. Morgan Kaufmann Publishers Inc. , San Francisco, CA, USA. • Krishna Kulkarni and Jan-Eike Michels. 2012. Temporal features in SQL: 2011. SIGMOD Rec. 41, 3 (October 2012), 34 -43. • Richard Thomas Snodgrass. 1999. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers Inc. , San Francisco, CA, USA. • ISO/IEC 9075 -2: 2011, Information technology — Database languages — SQL , 2011 • Martin Kaufmann, Peter M. Fischer, Norman May, Donald Kossmann: Benchmarking Bitemporal Database Systems: Ready for the Future or Stuck in the Past? EDBT 2014: 738 -749 Dec-20 Bitemporal Data Model - PG Open 2017 3

Previous Work Overview What we’ve presented a year ago Dec-20 Bitemporal Data Model - PG Open 2017 4

Summary The surveys (Kosman et al) indicate, that is should be easy to support time in Postgres due to the existence of the GIST indexes. And we’ve utilized this advantage. Last year we presented the Postgres-based implementation of asserted versioning framework (Johnston and Weis). This model supports effective time and asserted time and differs from the ANSI 2011 standard. Dec-20 Bitemporal Data Model - PG Open 2017 5

Where Our Work Can Be Found pg_bitemporal git repo https: //github. com/scalegenius/pg_bitemporal Includes: • Domain definition • Allen relations implementation (including the ones already implemented in Postgres) • Operations: • • • create bitemporal table bitemporal insert bitemporal update bitemporal correction inactivate bitemporal delete • Support of bitemporal constraints Dec-20 Bitemporal Data Model - PG Open 2017 6

Overview Of Bitemporal Operations The next several slides present the basic bitemporal data manipulation operations and explain the differences between bitemporal and conventional operations. Terminology: - effective time - asserted time - the combination of effective and asserted times: time regions Dec-20 Bitemporal Data Model - PG Open 2017 7

Bitemporal Insert now = 2015 -05 -01 select ll_bitemporal_insert( 'customers’, , $$’customer_no’, name', ’type' $$, , $$'C 100', 'John Doe', 'Silver' $$, timeperiod('2015 -06 -01', 'infinity'), timeperiod('2015 -05 -01', 'infinity') A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [ 2015 -05 -01 , oo) C 100 John Doe Silver Effective 1 Dec-20 Bitemporal Data Model - PG Open 2017 8

Bitemporal Update now = 2015 -09 -15 select ll_bitemporal_update($$customers$$, $$customer_no$$, $$100$$, $$type$$, $$Gold$$, timeperiod(‘ 2015 -09 -15’, ‘infinity’), timeperiod(‘ 2015 -09 -15’, ‘infinity’)) A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) [ 2015 -09 -15, oo) C 100 John Doe Gold Effective 1 1 2 Dec-20 Bitemporal Data Model - PG Open 2017 3 9

Bitemporal Correction now = 2015 -09 -15 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C 100$$, timeperiod('2015 -09 -15', 'infinity'), now()) A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) [2015 -09 -15, 2015 -09 -22) C 100 John Doe Gold 4 [2015 -09 -15, oo) [ 2015 -09 -22, oo) C 100 John Doe Platinum Effective 1 1 3 2 Dec-20 3 2 Bitemporal Data Model - PG Open 2017 4 10

Bitemporal Inactivate now = 2015 -11 -05 select ll_bitemporal_inactivate( $$customers$$, $$customer_no$$, $$C 100$$, timeperiod('2015 -12 -31', 'infinity'), timeperiod('2015 -11 -05', 'infinity'), A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) [2015 -09 -15, 2015 -09 -22) C 100 John Doe Gold 4 [2015 -09 -15, oo) [2015 -09 -22, 2015 -11 -05) C 100 John Doe Platinum 5 [2015 -09 -15, 2015 -12 -31 ) [ 2015 -11 -05, oo) C 100 John Doe Platinum Effective 1 1 3 3 2 Dec-20 2 4 4 5 Bitemporal Data Model - PG Open 2017 11

Bitemporal Delete now = 2015 -11 -17 select ll_bitemporal_delete( 'customers', $$ customer_no $$, $$ 'C 100' $$, timeperiod('2015 -11 -17', 'infinity')) A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) [2015 -09 -15, 2015 -09 -22) C 100 John Doe Gold 4 [2015 -09 -15, oo) [2015 -09 -22, 2015 -11 -05) C 100 John Doe Platinum 5 [2015 -09 -15, 2015 -12 -31 ) [2015 -11 -05, 2015 -11 -17 ) C 100 John Doe Platinum Effective 1 1 3 3 2 4 2 5 Dec-20 Bitemporal Data Model - PG Open 2017 4 5 12

Bitemporal Constraints • We need to support the following constraint types: Primary key – supported by GIST with exclusion Unique - supported by GIST with exclusion Check – no difference from regular tables IS/IS NOT NULL – no difference from regular tables Foreign key – most difficult to support • We use metacode to record the presence of the bitemporal constraints: PK: select bitemporal_internal. pk_constraint('postgres_cluster_id'); UQ: select bitemporal_internal. unique_constraint('port'); Dec-20 Bitemporal Data Model - PG Open 2017 13

Defining Foreign Key Constraint The difficulty of verifying the bi-temporal FK is that the PK/UQ in the parent table should be effective and asserted all the time when a dependent record is effective/asserted CONSTRAINT "bitemporal fk postgres_version_database_versionsrelease_version" check (true or 'fk' <> '@postgres_version -> database_versions(release_version)@')); Function: select bitemporal_internal. fk_constraint( 'postgres_version’ , 'database_versions’ , 'release_version'); Dec-20 Bitemporal Data Model - PG Open 2017 14

FK Creation • Check whether the referencing field is a PK/UQ • validate_bitemporal_pk_uq • Create check constraint • fk_constraint • Check whether the validation on the parent table field already exists • ll_lookup_validation_function • If not, create it • ll_generate_fk_validate • create trigger on insert/update and a trigger function Dec-20 Bitemporal Data Model - PG Open 2017 15

How a Validation Function Should Work? In order to define the algorithm, we need to take a closer look at how bitemporal queries work. Dec-20 Bitemporal Data Model - PG Open 2017 16

Understanding Bitemporal Queries Dec-20 Bitemporal Data Model - PG Open 2017 17

Types of Bitemporal Queries • Snapshot queries – how the data looked at a certain point of time • What was the status of customer_no ’C 100’ on Sep 1 2015? • May also use one-dimensional conditions • Interval queries in selection criteria • How did the status change for customer_no ‘C 100’ during 2015? • May return multiple rows with time regions • One-dimensional conditions are also here • Correlated time queries • What was the status of customer ‘C 200’ when customer ‘C 100’ was ’Platinum’? • Values are equal only at time when they are valid in both time regions Dec-20 Bitemporal Data Model - PG Open 2017 18

How to Define Time Regions For the Query Result? • Conceptually each row has associated bitemporal time region • The strategy is to perform an operation as in non-temporal DB and calculate region for each tuple of result • Operations on time regions: • ∩ intersection • ∪ union • − difference • Examples: • Interval query: intersect query region with tuple regions: what was the customer type for customer C 100 in September 2015? • Join: find matching tuples and intersect their regions: what was the status of customer C 300 when customer C 100 was “Platinum”? Dec-20 Bitemporal Data Model - PG Open 2017 19

Defining Time Regions For Updates • UPDATE a_table SET attr = (expr) WHERE (cond) • Time regions for: • S – the time region defined by UPDATE function (for new values to be set) • P – time region for each row identified by WHERE clause (with old values to be updated) • N- time region for each already existing row with values, which we are going to assign (empty of not exists) • P ⋂ N = � • After UPDATE the DB will contain regions: • Old values: P ∖ S • New values: N ∪S Dec-20 Bitemporal Data Model - PG Open 2017 P s N 20

Time Regions as Sets of Rectangles • Operations on time regions: • ∩ intersection • ∪ union • − difference • Asserted interval × Effective interval = rectangle • Unfortunately, union and difference of rectangles are not rectangular • Sets of non-intersecting rectangles: all operations produce sets or rectangles Dec-20 Bitemporal Data Model - PG Open 2017 21

Operations on Regions (Rectangle Sets) • Disjoin (auxiliary) • Split both arguments into smaller rectangles such that each pair of rectangles from different arguments either coincide or do not overlap • Set-theoretic operations: union, intersection, set difference: • Calculate the disjoin • Choose rectangles that belong to union/intersection/difference Dec-20 Bitemporal Data Model - PG Open 2017 22

Using Rectangle sets • Checking foreign key constraints Rect_set (FK) ⊊ rect_set (PK) • Computing time regions (rectangle sets) for query results Intersection Union Difference • Bulk inserts and updates Dec-20 Bitemporal Data Model - PG Open 2017 23

Finding One-Dimensional (Interval) Disjoin • Sort all left and right ends of source intervals together • Choose intervals that intersect with or overlap with at least one of source intervals Dec-20 Bitemporal Data Model - PG Open 2017 24

Two-Dimensional Disjoin Algorithm • Project both arguments on one dimension (say, on asserted and ignore effective time) • Compute on-dimensional disjoin • Put back the second dimension producing stripes • For each stripe, find onedimensional disjoin. Dec-20 Bitemporal Data Model - PG Open 2017 25

Enhancements To The Existing Bitemporal Functions New features added as a result of bitemporal production system implementation Dec-20 Bitemporal Data Model - PG Open 2017 28

Bulk Operations INSERT-SELECT The power of SQL is operating on sets, but so far we only implemented single-row insert. The new function ll_bitemporal_insert_select(p_table text, p_list_of_fields text, p_select TEXT, p_effective temporal_relationships. timeperiod, p_asserted temporal_relationships. timeperiod ) Allows to pass a select statement as a parameter. Dec-20 Bitemporal Data Model - PG Open 2017 29

UPDATE/CORRECTION AS SELECT Our existing bitemporal_update and bitemporal_correction support updating multiple records in one call, but the update conditions are reduced to the lists of values New “bulk” update and correction will allow free-format expressions for both selection criteria and new values. The same concept will be applied to the DELETE/INACTIVATE functions. Dec-20 Bitemporal Data Model - PG Open 2017 30

Consistent Transactional Behavior Question: What should be recorded in a bitemporal table, when several operations happen in one transaction and affect the same record(s)? Transactions are atomic, thereby the intermediate states should not be visible. Also, if we materialize the record status before (and regardless) of transaction commit/rollback, we might (most likely) get inconsistent results for the snapshot queries. Dec-20 Bitemporal Data Model - PG Open 2017 31

Bitemporal Correction Immediately After Insert now = 2015 -09 -15 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C 100$$, timeperiod('2015 -09 -15', 'infinity'), Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) [2015 -09 -15, 2015 -09 -15) C 100 John Doe Gold 4 [2015 -09 -15, oo) [ 2015 -09 -15, oo) C 100 John Doe Platinum Effective 1 1 2 Dec-20 3 2 Bitemporal Data Model - PG Open 2017 3 A s s e r t e d # 4 32

Corrected Behavior Of UPDATE/CORRECTION § Now we check the asserted interval of the “last” record, and if the “last” asserted start is the same as the “new” asserted start we do not insert a new record, but update the old one. This still allows the ”non-transactional” behavior, of the app developers choose to ignore transactions § Bitemporal correction has a new parameter: asserted_start_time (instead of “now” by default). One can still use the old version Dec-20 Bitemporal Data Model - PG Open 2017 33

Bitemporal Correction – Correct Behavior now = 2015 -09 -15 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C 100$$, timeperiod('2015 -09 -15', 'infinity'), A s s e r t e d # Effective Interval Assertive Interval Customer No. Name Type 1 [2015 -06 -01, oo) [2015 -05 -01, 2015 -09 -15) C 100 John Doe Silver 2 [2015 -06 -01, 2015 -09 -15) [2015 -09 -15, oo) C 100 John Doe Silver 3 [2015 -09 -15, oo) C 100 John Doe Platinum 4 [2015 -09 -15, oo) [ 2015 -09 -22, oo) Cl 00 John Doe Platinum Effective 1 1 2 Dec-20 3 2 Bitemporal Data Model - PG Open 2017 3 34

High-level Functions The app developers do not care about how exactly bitemporal support is implemented. We created a high-level functions for them to perform select/insert/update/delete without knowing about the bitemporal operations Example: create or replace function common. hl_bitemporal_insert (p_schema_name text, p_table_name text, p_list_of_fields text, p_list_of_values text) returns integer Dec-20 Bitemporal Data Model - PG Open 2017 35

Performance • We are using pg_bitemporal for both OLTP and OLAP systems, and the data volumes appear to be manageable • Indexing Strategy: GIST indexes are efficient, but occasionally we need to build extra indexes, for example conditional index with UPPER (asserted)=‘infinity’ Dec-20 Bitemporal Data Model - PG Open 2017 36

Future Work • Finalizing FK support • Finalizing constraints creation • Finalizing bitemporal UPDATE • Continue research on performance and the choice of indexes Many thanks to Braviant Holdings leadership for their continuous support! Dec-20 Bitemporal Data Model - PG Open 2017 37
- Slides: 35