CHANGE DATA CAPTURE A BRIEF OVERVIEW PRESENTED BY
CHANGE DATA CAPTURE: A BRIEF OVERVIEW PRESENTED BY TIM WEIGEL @TIMWEIGEL@GMAIL. COM TWEIGEL@XTIVIA. COM
CHANGE DATA CAPTURE: WHAT IS IT? • Broadly speaking, Change Data Capture (CDC) is a set of methods used to identify and track changes made to data • There a number of ways to track changes in a database • I’m just going to give you the Wikipedia link: https: //en. wikipedia. org/wiki/Change_data_capture
CHANGE DATA CAPTURE IN SQL SERVER • First implemented in SQL Server 2008 • Uses a our friend the Transaction Log to get change information • Enterprise-only feature – Change Tracking is available in all editions, but only tells you that something changed – it doesn’t tell you what! – There are community-based alternatives to CDC, but they rely on different mechanisms
WHAT MAKES CHANGE DATA CAPTURE TICK • Our old friend, sp_replcmds – CDC will use the same Log Reader Agent to populate the change tables and the distribution tables if the table is published – If not, the cdc. <table>_capture job executes sp_MScdc_capture_job which executes sp_cdc_scan which then executes sp_replcmds • And a few other things, too… – CDC also has a cleanup job which executes sp_MScdc_cleanup_job which gathers retention information and then executes sp_cdc_cleanup_change_table The first rule of data warehouse is don’t talk about data warehouse… So we’re not gonna.
ENABLING CHANGE DATA CAPTURE • CDC must be enabled at the database level first: – EXEC sys. sp_cdc_enable_db • CDC must then be enabled on the tables you wish to track: – EXEC sys. sp_cdc_enable_table @source_schema = N'<schema>’ , @source_name = N'<table>’ , @role_name = N'<role>' or NULL , @capture_instance = N'<capture_instance>' – optional , @supports_net_changes = <1 or 0> -- optional , @index_name = N'<index_name>' – optional , @captured_column_list = N'<column_list>' or NULL – optional , @filegroup_name = N'<filegroup_name>' – optional , @allow_partition_switch = <1 or 0> -- optional
CHANGE DATA CAPTURE METADATA AND SUPPORT TABLES • Created when CDC is enabled on the DB • Metadata tables – cdc. change_tables – cdc. index_columns – cdc. captured_columns • Other tables of note – cdc. lsn_time_mapping – cdc. ddl_history – dbo. systranschemas
CHANGE DATA CAPTURE CHANGE TABLES • The default naming convention is cdc. <schema>_<table name>_CT • Common columns: – __$start_lsn and __$end_lsn – __$seqval – __$operation • 1 = delete, 2 = insert, 3 = update (before), and 4 = update (after) – __$update_mask • Variable bitmap, one bit per column • INSERTs and DELETEs will have all bits set, but UPDATES will only set the bits corresponding to the updated columns
WORKING WITH CHANGE DATA CAPTURE • How to get data out – Query the change table directly • Well, Microsoft says you oughtn’t do this… But you’re totally going to do this anyway, at least once – There are several built-in functions and procs! • cdc. fn_cdc_get_all_changes_<capture_insta nce> and cdc. fn_cdc_get_net_changes_<capture_insta nce> • sys. fn_cdc_map_lsn_to_time and sys. fn_cdc_map_time_to_lsn • sys. fn_cdc_has_column_changed and sys. fn_cdc_get_column_ordinal • sys. sp_cdc_get_captured_columns and sys. sp_cdc_get_ddl_history • And many more!
WORKING WITH CHANGE DATA CAPTURE • How to tune it – Customize sp_cdc_scan? You can! – msdb. dbo. cdc_jobs – And of course, procs to work with it! • sys. sp_cdc_help_jobs • sys. sp_cdc_change_job allows you to change the parameters • How to keep an eye on it – sys. sp_cdc_help_change_data_capture – sys. dm_cdc_log_scan_sessions lets you know what’s currently going on • Where to find the errors – sys. dm_cdc_errors
DISABLING CHANGE DATA CAPTURE • At the table level: – EXEC sys. sp_cdc_disable_table @source_schema = N'<schema>’ , @source_name = N'<table>’ , @capture_instance = N'<all or capture_instance>' • At the database level: – EXEC sys. sp_cdc_disable_db
A FEW COMMON ISSUES • • CDC assumes you know what you’re doing – it doesn’t give you warnings. Rather, it gives you rope. Triggers on CDC tables: recipe for performance issues, if not outright disaster CDC tables are not persistent if CDC is disabled Replication and CDC: best frenemies – They use the same log reader if the same table(s) are published and tracked – New snapshots will disable CDC if a subscriber is tracked • Yes, without warning – Proc exec isn’t available in transactional replication if CDC is also enabled • New columns aren’t automatically added to CDC when added to a tracked table – It’s possible to add them without disabling and re-enabling CDC, but it’s kinda clunky • Computed columns always have a value of NULL if captured, and some other unexpected behaviors
DEMO TIME! • Enabling CDC • Disabling CDC • Status • Errors
QUESTION TIME! • Thank you very much! • @timweigel • timweigel@gmail. com • tweigel@xtivia. com
- Slides: 13