BIE 13 INT Parallel Processing Late Arriving Facts
BIE 13 -INT
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Package Benchmark Baseline Step Time Dim. Customer 00: 02: 37: 987 350 1, 202. 0 Dim. Nation 00: 00: 263 121 95. 0 Dim. Part 00: 19: 777 2557 10, 113. 0 Dim. Part. Supplier 00: 01: 51: 643 2121 7, 165. 0 Dim. Region 00: 00: 417 76 12. 0 Dim. Supplier 00: 02: 207 1218 4, 533. 0 Lineitem 00: 13: 56: 847 1656 7, 171. 0 Orders 00: 01: 32: 843 3697 16, 156. 0 1, 475 5, 805. 9 00: 18: 15: 190 KB / Sec Rows / Sec
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Dim. Customer Dim. Region Dim. Nation Dim. Part Dim. Supplier Dim. Part. Supplier
Scheduler Pn … P 5 P 4 DTExec (1) P 3 P 2 P 1 DTExec (2) DTExec (n)
Server Farm Extract Transform Load
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Destination Source Staging
Use ORDER BY clause Use GROUP BY clause Set Is. Sorted and Sort. Key. Position COUNT() and other SQL Functions Perform JOINs directly in the source query
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Audit Columns Log Scraping Timed Extracts Database Diff • Modified flag • Checksum • Extract operations from Transaction Log • Last. Modified. Date column • Comparing database snapshots
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Stage • Store data in a temporary table Optimize • Provides additional performance benefits Execute • Run the MERGE SQL statement in two steps
MERGE INTO [Dim. Product] AS FACT USING [Staging] AS SRC ON ( FACT. Product. Alternate. Key = SRC. Product. Alternate. Key ) WHEN MATCHED AND FACT. End. Date is NULL -- update the current record THEN UPDATE SET FACT. [Arabic. Description] = SRC. Arabic. Description , FACT. [Chinese. Description] = SRC. Chinese. Description , FACT. [English. Description] = SRC. English. Description , FACT. [French. Description] = SRC. French. Description , FACT. [German. Description] = SRC. German. Description , FACT. [Hebrew. Description] = SRC. Hebrew. Description , FACT. [Japanese. Description] = SRC. Japanese. Description , FACT. [Thai. Description] = SRC. Thai. Description , FACT. [Turkish. Description] = SRC. Turkish. Description , FACT. [Reorder. Point] = SRC. Reorder. Point , FACT. [Safety. Stock. Level] = SRC. Safety. Stock. Level ;
INSERT INTO [Dim. Product] ([Product. Alternate. Key], [List. Price], [English. Description], [Start. Date]) SELECT [Product. Alternate. Key], [List. Price], [English. Description], [Start. Date] FROM ( MERGE INTO [Dim. Product] AS FACT USING [Staging] AS SRC ON ( FACT. Product. Alternate. Key = SRC. Product. Alternate. Key ) WHEN NOT MATCHED THEN INSERT VALUES ( SRC. Product. Alternate. Key , SRC. List. Price , SRC. English. Description , GETDATE() -- Start. Date , NULL -- End. Date ) WHEN MATCHED AND FACT. End. Date is NULL THEN UPDATE SET FACT. End. Date = GETDATE() OUTPUT $Action_Out , SRC. Product. Alternate. Key , SRC. List. Price , SRC. English. Description , GETDATE() Start. Date ) AS MERGE_OUT WHERE MERGE_OUT. Action_Out = 'UPDATE'
Seconds to Process SCD Performance Comparison – 100 k Row Dimension Table 1000 900 800 700 600 500 400 300 200 100 0 SCD Wizard (No Insert/Update) SCD Wizard (Optimized) MERGE 20 k Rows 27 310 70 12 100 k Rows 128 1674 404 94 200 k Rows 260 3632 854 178
Ease of Creation Maintenance Large Dimensions Small Change Set Overall Performance Error Recovery
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Dim. Product Fact. Orders
select [Product. Key], [Product. Alternate. Key], [Start. Date], [End. Date] from [Dim. Product] where [Product. Alternate. Key] = ? and [Start. Date] <= ? and ( [End. Date] is null or [End. Date] > ? )
Order. Date Product. Number 2001 -07 -01 LJ-0192 -L Order. Date Product Key Start. Date End. Date 2001 -07 -01 LJ-0192 -L 232 2001 -07 -01 2002 -06 -30 2001 -07 -01 LJ-0192 -L 233 2002 -07 -01 2003 -06 -30 2001 -07 -01 LJ-0192 -L 234 2003 -07 -01 NULL Order. Date Product Key 2001 -07 -01 LJ-0192 -L 232
Range Lookup Performance Comparison 140000 120000 Rows / Second 100000 80000 60000 40000 20000 0 Rows Per Second Lookup 3639 Merge Join 55708 Script (Partial) 123659 Script (Full) 128978
Lookup Merge Join & Conditional Split Custom Script Ok for small number of rows Best for equal data set size Good middle ground Best for overall performance
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Full Cached created pre-data flow Must fit into memory Partial Cache created on demand Cache size configurable Misses can be cached as well None No rows are cached Use with volatile data
Full cache grabs most common values Partial cache grabs the rest
Reducing database and memory usage Customers
Cache most common values SELECT TOP Customer. Id … Customers Get the rest Customers
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
Region not found Replace with a constant NULL value Repeat
Ignore missing matches Handle all subs in a single transform
Parallel Processing Late Arriving Facts Using the Database Engine Change Data Capture Slowly Changing Dimensions Lookup Cache Null Value Substitution General Performance Tips
www. microsoft. com/teched www. microsoft. com/learning http: //microsoft. com/technet http: //microsoft. com/msdn
- Slides: 67