TRACKING DATA CHANGES With Temporal Tables and More
TRACKING DATA CHANGES With Temporal Tables and More
INTRODUCTION What is this presentation about? • The how's and whys of tracking changes to the data in your database using SQL Server change tracking technologies available in SQL 2016 • Emphasis on temporal tables and as time permits • Change Data Capture • Change Tracking • Triggers
TRACKING DATA CHANGES OVERVIEW Possible Whys • Audit trail - who changed what when • Analysis - maintaining a data warehouse • Time travel - reconstructing state of the data at a point in time in the past • Data sync – one and two way sync’ing • Repairing row-level data - recovering from accidental data changes and application errors
TRACKING DATA CHANGES OVERVIEW Things to consider when preparing to track changes • Identifying what to track and what not to track • Costs of tracking / not tracking • Legal requirements • New vs legacy systems • What will your hardware / software support
TRACKING DATA CHANGES OVERVIEW Dealing with date and time • Time zones and UTC • Format of • Assertion(History) vs version (Validity) vs bi-temporal • Does the date/time represent when the change occurred or when the information in the row was valid. • Who or what is the date/time relative to?
TEMPORAL TABLES • Part of the ANSI SQL 2011 standard. First introduced with SQL Server 2016 • Designed to keep a full history of data changes at the row level using a ‘current’ table and a history table • System-versioned because the start and end dates for each row is managed by the system • Assertion / historical in nature meaning that the lifetime of a is record based on the physical dates the record was changed
TEMPORAL TABLES Good for • • Audit trail Point in time analysis Maintaining a data warehouse Repairing row-level data Not so good for • • Data synchronization Short term data history requirements Column level change tracking Tables that have frequent schema changes
TEMPORAL TABLES - DEMO Requirements for the base table • Must have two explicitly defined datetime 2 non-nullable columns (period columns) • Used exclusively by the system to record ‘life-span’ of a row • Names of columns can be customized • Date/time stored as coordinated universal time (UTC) • Can be hidden • Must have a PERIOD FOR SYSTEM_TIME statement. Identifies the period columns • Must have a primary key • INSTEAD OF triggers are not allowed on the base table. AFTER triggers allowed. • Foreign keys with cascade update or delete are not allowed
TEMPORAL TABLES - DEMO Requirements for the history table • Must have the same schema as base table • No primary key, foreign keys, unique indexes, table constraints or triggers. • Cannot be configured for change data capture, change tracking, transactional or merge replication • By default, a clustered index created using period columns • Can be in a different schema than the base table but the same database • Data cannot be modified. This includes INSERT, UPDATE, DELETE and TRUNCATE
CHANGE DATA CAPTURE • First introduced in SQL 2008, Change Data Capture (CDC) captures DML changes on a tracked table. Enterprise only unless you have SQL 2016 SP 1 • Uses SQL Server Agent jobs to asynchronously read the transaction log to track and record DML transactions • When enabling CDC on a table, an additional table is created containing the same columns as tracked table along with metadata needed to understand the changes • Table-valued functions are also created to allow access the change history • CDC was written by the SQL Server Replication team so it works with transaction replication
CHANGE DATA CAPTURE Good for • Maintaining a data warehouse • Short term audit trail • Data synchronization (using net changes) Not so good for • Long term storage of change history • Point in time analysis • Solutions that require real time data capture
CHANGE DATA CAPTURE How it works SQL Server Agent jobs • One used to populate the change tables • One responsible for change table cleanup • SQL Server Agent must be running for data capture and clean up to work Transaction logs • SQL Server Agent jobs need to be running for the transaction log to get truncated • Change data is not available until after change is committed to source table and capture job has processed the related log entries
CHANGE DATA CAPTURE - DEMO Validity interval • Changes captured by CDC has a finite lifespan called the validity interval. • Important because the extraction interval for a request must be fully covered by this time period. • Default is 3 days measured in mins Metadata • __$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change. • __$end_lsn not supported. Future compatibility not guaranteed. Column always NULL (SQL 2012) • __$seqval used to order changes that occur in the same transaction • __$operation records the operation associated with the change: 1 = delete, 2 = insert, 3 = update (before), and 4 = update (after) • __$update_mask variable bit mask with one defined bit for each captured column. • Insert and delete - all bits set. Update - bits set correspond to changed columns
CHANGE DATA CAPTURE - DEMO Requirements • Does not play nice with memory-optimized tables • SQL Server Agent must be running • Net changes: source table has a primary key or an unique index referenced by index_name when setting up CDC on the table.
CHANGE TRACKING • A lightweight solution first introduced in SQL 2008 that provides an efficient mechanism to identify records in a user table that have changed on account of DML transactions • Tracks the fact that a row has changed but not how many times the row has changed or the values of any intermediate changes. • Changes are tracked as they occur • Clean up of internal tables used to keep track of what records have changed are managed by the system • Built-in functions are available to easily query to data changes
CHANGE TRACKING How it works • Once enabled, any DML statement will cause change tracking information for each modified row to be recorded. • Each table enabled for Change Tracking has an internal on-disk table created on the same filegroup. This table is used to record table change versions and the rows that have changed since a particular version. • When a change occurs the row’s primary key, list of impacted columns (optional) and DML command are recorded in the internal tracking table. • An auto cleanup thread purges expired content of the internal tracking tables based on the retention period setting of the database.
CHANGE TRACKING Good for • Data synchronization Not so good for • Any solution that requires that each data change is tracked and recorded.
CHANGE TRACKING - DEMO Requirements • The database compatibility level must be set to 90 (SQL 2005) or greater. • Snapshot isolation is strongly recommended to help ensure that all change tracking information is consistent. • Tables being tracking must have a primary key. • DDL operations are not tracked • Deletes caused by a TRUNCATE statement are not tracked. However, this will cause the minimum valid version to be updated. Client applications would then need re-initialize their data.
TRIGGERS • A special kind of stored procedure attached to an event. When the event happens, SQL Server executes the trigger’s code • 4 types: DML, DDL, CLR, Logon • INSTEAD OF and AFTER • DML triggers in SQL Server work at the statement-level. This means that one trigger instance fires per statement regardless of how many rows were affected, even zero • Triggers extend the life of a transaction. While a trigger is at work, the associated transaction will not commit until the trigger completes • Due to their event driven nature, issues caused by triggers can be difficult to troubleshoot
TRIGGERS Good for • Auditing • One audit table per user table or one audit table for all user tables • Anything if you want to put the work into building the solution • Any environment that is SQL Server 2005 or earlier Not so good for • Scenarios where simplicity is required or other change tracking technologies would do the job just as well or better
TRIGGERS - DEMO
CONCLUSION The tools available natively in Microsoft SQL Server to track data changes have gotten better and better over time. With SQL 2016 SP 1 most if not all these tools are available to everyone with a SQL standard license or better. There is really no reason why you should be suffering with a system that does a poor job of tracking changes in your DB. In addition to the tools that are offered natively in SQL Server there also 3 rd party venders that provide off the shelf solutions for tracking data changes. For example, one of the sponsors of today’s event Apex. SQL has a product called Apex. SQL Audit that tracks and reports events on SQL Server by auditing access and changes to the SQL Server instance and its objects.
- Slides: 22